Deflation  Techniques  for  air 
Implicitly  Re-started  Arnoldi  Iteration 

R.B.  Lehoucq 
Danny  C.  Sorensen 


September,  1994 
(revised  February  1995) 


TR94-13 


Report  Documentation  Page 

Form  Approved 

OMB  No.  0704-0188 

Public  reporting  burden  for  the  collection  of  information  is  estimated  to  average  1  hour  per  response,  including  the  time  for  reviewing  instructions,  searching  existing  data  sources,  gathering  and 
maintaining  the  data  needed,  and  completing  and  reviewing  the  collection  of  information.  Send  comments  regarding  this  burden  estimate  or  any  other  aspect  of  this  collection  of  information, 
including  suggestions  for  reducing  this  burden,  to  Washington  Headquarters  Services,  Directorate  for  Information  Operations  and  Reports,  1215  Jefferson  Davis  Highway,  Suite  1204,  Arlington 

VA  22202-4302.  Respondents  should  be  aware  that  notwithstanding  any  other  provision  of  law,  no  person  shall  be  subject  to  a  penalty  for  failing  to  comply  with  a  collection  of  information  if  it 
does  not  display  a  currently  valid  OMB  control  number. 

1.  REPORT  DATE 

FEB  1995 

2.  REPORT  TYPE 

3.  DATES  COVERED 

00-00-1995  to  00-00-1995 

4.  TITLE  AND  SUBTITLE 

5a.  CONTRACT  NUMBER 

Deflation  Techniques  for  an  Implicitly  Re-started  Arnoldi  Iteration 

5b.  GRANT  NUMBER 

5c.  PROGRAM  ELEMENT  NUMBER 

6.  AUTHOR(S) 

5d.  PROJECT  NUMBER 

5e.  TASK  NUMBER 

5f.  WORK  UNIT  NUMBER 

7.  PERFORMING  ORGANIZATION  NAME(S)  AND  ADDRESS(ES) 

Computational  and  Applied  Mathematics  Department  ,Rice 

University, 6100  Main  Street  MS  134, Houston, TX, 77005-1892 

8.  PERFORMING  ORGANIZATION 

REPORT  NUMBER 

9.  SPONSORING/MONITORING  AGENCY  NAME(S)  AND  ADDRESS(ES) 

10.  SPONSOR/MONITOR'S  ACRONYM(S) 

11.  SPONSOR/MONITOR'S  REPORT 
NUMBER(S) 

12.  DISTRIBUTION/AVAILABILITY  STATEMENT 

Approved  for  public  release;  distribution  unlimited 

13.  SUPPLEMENTARY  NOTES 

14.  ABSTRACT 

15.  SUBJECT  TERMS 

16.  SECURITY  CLASSIFICATION  OF: 

17.  LIMITATION  OF 
ABSTRACT 

18.  NUMBER 

OF  PAGES 

38 

19a.  NAME  OF 
RESPONSIBLE  PERSON 

a.  REPORT 

unclassified 

b.  ABSTRACT 

unclassified 

c.  THIS  PAGE 

unclassified 

Standard  Form  298  (Rev.  8-98) 

Prescribed  by  ANSI  Std  Z39-18 


Deflation  Techniques  within  an  Implicitly  Re-started 

Arnoldi  Iteration* 


R.  B.  Lehoucq 
Computational  and  Applied 
Mathematics  Department 
Rice  University 
Houston,  TX  77251 
lehoucq@ri.ce . edu 


D.  C.  Sorensen 
Computational  and  Applied 
Mathematics  Department 
Rice  University 
Houston,  TX  77251 
sorensen@ri.ce . edu 


February  3,  1995 


Abstract 

A  deflation  procedure  is  introduced  that  is  designed  to  improve  the  conver¬ 
gence  of  an  implicitly  restarted  Arnoldi  iteration  for  computing  a  few  eigenvalues 
of  a  large  matrix.  As  the  iteration  progresses  the  Ritz  value  approximations  of 
the  eigenvalues  of  A  converge  at  different  rates.  A  numerically  stable  scheme  is 
introduced  that  implicitly  deflates  the  converged  approximations  from  the  itera¬ 
tion.  We  present  two  forms  of  implicit  deflation.  The  first,  a  locking  operation, 
decouples  converged  Ritz  values  and  associated  vectors  from  the  active  part  of  the 
iteration.  The  second,  a  purging  operation,  removes  unwanted  but  converged  Ritz 
pairs.  Convergence  of  the  iteration  is  improved  and  a  reduction  in  computational 
effort  is  also  achieved.  The  deflation  strategies  make  it  possible  to  compute  multi¬ 
ple  or  clustered  eigenvalues  with  a  single  vector  restart  method.  A  Block  method 
is  not  required.  These  schemes  are  analyzed  with  respect  to  numerical  stability 
and  computational  results  are  presented. 

AMS  classification:  Primary  65F15;  Secondary  65G05 

Key  Words  :  Arnoldi  method,  Lanczos  method,  eigenvalues,  defla¬ 
tion,  implicit  restarting  . 


1  Introduction 

The  Arnoldi  method  is  an  efficient  procedure  for  approximating  a  subset  of  the  eigen- 
system  of  a  large  sparse  n  X  n  matrix  A.  The  Arnoldi  method  is  a  generalization  of 
the  Lanczos  process  and  reduces  to  that  method  when  the  matrix  A  is  symmetric. 
After  k  steps  the  algorithm  produces  an  upper  Hessenberg  matrix  Hk  of  order  k.  The 
eigenvalues  of  this  small  matrix  Hk  are  used  to  approximate  a  subset  of  the  eigenvalues 

‘This  work  was  supported  in  part  by  ARPA  (U.S.  Army  ORA4466.01),  by  the  Department  of  En¬ 
ergy  (Contract  DE-FG0f-91ER25103)  and  by  the  National  Science  Foundation  (Cooperative  agreement 
CCR-9120008.) 
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of  the  large  matrix  A.  The  matrix  Hk  is  an  orthogonal  projection  of  A  onto  a  partic¬ 
ular  Krylov  subspace  and  the  eigenvalues  of  Hk  are  usually  called  Ritz  values  or  Ritz 
approximations . 

There  are  a  number  of  numerical  difficulties  with  Arnoldi/Lanczos  methods.  In  [33] 
a  variant  of  this  method  was  developed  to  overcome  these  difficulties.  This  technique, 
the  Implicitly  Restarted  Arnoldi  iteration  (iRA-iteration)  may  be  viewed  as  a  truncation 
of  the  standard  implicitly  shifted  QR-iteration  and  shares  a  number  of  its  desirable 
properties.  Because  of  this  connection,  we  are  motivated  to  take  advantage  of  the  well 
understood  deflation  rules  of  the  QR-iteration  and  to  adapt  these  to  the  iRA-iteration. 
These  deflation  techniques  are  extremely  important  with  respect  to  convergence  and 
numerical  properties.  Deflation  rules  have  contributed  greatly  to  the  emergence  of  the 
practical  qr  algorithm  as  the  method  of  choice  for  computing  the  eigen-system  of  dense 
matrices. 

This  paper  introduces  deflation  schemes  that  may  be  used  within  an  IRA-iteration. 
This  iteration  is  designed  to  compute  a  selected  subset  of  the  spectrum  of  A  such  as  the 
k  eigenvalues  of  largest  real  part.  We  refer  to  this  selected  subset  as  wanted  and  the 
remainder  of  the  spectrum  as  unwanted .  As  the  iteration  progresses  some  of  the  Ritz 
approximations  to  eigenvalues  of  A  may  converge  long  before  the  entire  set  of  wanted 
eigenvalues  has  been  computed.  These  converged  Ritz  values  may  be  part  of  the  wanted 
or  the  unwanted  portion  of  the  spectrum.  In  either  case  it  is  desirable  to  deflate  the 
converged  Ritz  values  and  corresponding  Ritz  vectors  from  the  unconverged  portion  of 
the  factorization.  If  the  converged  Ritz  value  is  wanted  then  it  is  necessary  to  keep  it 
in  the  subsequent  Arnoldi  factorizations.  This  is  called  locking.  If  the  converged  Ritz 
value  is  unwanted  then  it  must  be  removed  from  the  current  and  subsequent  Arnoldi 
factorizations.  This  is  called  purging.  These  notions  will  be  made  precise  during  the 
course  of  the  paper.  For  the  moment  we  note  that  the  advantages  of  a  numerically 
stable  deflation  strategy  include: 

•  Reduction  of  the  ivorking  size  of  the  desired  invariant  subspace. 

•  The  ability  to  determine  clusters  of  nearby  eigenvalues  without  need  for  a  Block 
Arnoldi  method  [20,  32], 

•  Preventing  the  effects  of  the  forward  instability  of  the  Lanczos  algorithm  [28,  39]. 

The  fundamentals  of  the  Arnoldi  algorithm  are  introduced  in  §  2  as  well  as  the 
determination  of  Ritz  value  convergence.  The  IRA-iteration  is  reviewed  in  §  3.  Deflating 
within  the  IRA-iteration  is  examined  in  §  4.  The  deflation  scheme  for  converged  Ritz 
values  is  presented  in  §  5.  The  practical  issues  associated  with  our  deflation  scheme 
are  examined  in  §  6.  These  include  block  generalizations  of  the  ideas  examined  in  §  5 
for  dealing  with  clusters  of  Ritz  values,  avoiding  the  use  of  complex  arithmetic  when 
a  complex  conjugate  pair  of  Ritz  values  converges.  An  error  analysis  of  the  deflated 
process  in  presented  in  §  7.  A  brief  survey  of  other  deflation  strategies  is  given  in  §  8. 
An  interesting  connection  with  the  various  algorithms  used  to  re-order  a  Schur  form  of 
matrix  is  presented  in  §  9.  Numerical  results  are  presented  in  §  10. 

Capital  and  lower  case  letters  denote  matrices  and  vectors  while  lower  case  Greek 
letters  denote  scalars.  The  j-th  canonical  basis  vector  is  denoted  by  ej.  The  norms 
used  are  the  Euclidean  and  Frobenius  denoted  by  ||  •  ||  and  ||  •  ||^,  respectively. 
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2  The  Arnoldi  Factorization 


Arnoldi’s  method  [2]  is  an  orthogonal  projection  method  for  approximating  a  subset 
of  the  eigensystem  of  a  general  square  matrix.  The  method  builds,  step  by  step,  an 
orthogonal  basis  for  the  Krylov  space, 

ICk(A,v i)  =  span{vi,  Avi, . . .,  Ak~lv+, 

for  A  generated  by  the  vector  v\.  The  original  algorithm  in  [2]  was  designed  to  reduce 
a  dense  matrix  to  upper  Hessenberg  form.  However,  the  method  only  requires  knowl¬ 
edge  of  A  through  matrix  vector  products,  and  its  ultimate  value  as  a  technique  for 
approximating  a  few  eigenvalues  of  a  large  sparse  matrix  was  soon  realized.  When  the 
matrix  A  is  symmetric  the  procedure  reduces  to  the  Lanczos  method  [24]. 

Over  a  decade  of  research  was  devoted  to  understanding  and  overcoming  the  numer¬ 
ical  difficulties  of  the  Lanczos  method  [27].  Development  of  the  Arnoldi  method  lagged 
behind  due  to  the  inordinate  computational  and  storage  requirements  associated  with 
the  original  method  when  a  large  number  of  steps  are  required  for  convergence.  Not 
only  is  more  storage  required  for  Hk  when  A  is  nonsymmetric,  but  in  general  more 
steps  are  required  to  compute  the  desired  Ritz  value  approximations.  The  explicitly 
restarted  Arnoldi  iteration  (ERA-iteration)  was  introduced  by  Saad  [30]  to  overcome 
these  difficulties.  The  idea  is  based  on  similar  ones  developed  for  the  Lanczos  process 
by  Paige  [26],  Cullum  and  Donath  [12],  and  Golub  and  Underwood  [19].  Karush  [23] 
proposes  what  appears  to  be  the  first  example  of  a  re-started  iteration. 

The  Arnoldi  method  is  introduced  in  this  section  and  implicit  restarting  is  presented 
in  the  following  section. 

After  k  steps,  the  Arnoldi  algorithm  computes  a  truncated  factorization 
(2.1)  AVk  =  VkHk  +  fkel, 

of  A  e  RnXn  to  upper  Hessenberg  form  where  14  =  fa.  The  vector  fa  is  the  residual 
and  is  orthogonal  to  the  columns  of  V4.  The  matrix  Hk  6  Rfcxfc  is  an  upper  Hessenberg 
matrix  that  is  the  orthogonal  projection  of  A  onto  Range(Vk )  =  K.k{A,v i). 

The  following  procedure  shows  how  the  factorization  is  extended  from  length  k  to 
k  +  p. 

Algorithm  2.1 

function  [Vk+P,  Hk+P,  fa+p]  =  Arnoldi  (A,  14,  Hk ,  fa,  k,p) 

Input:  AVk  -  VkHk  =  fke{  with  Vjf  14  =  fa,  Vjf  fa  =  0. 

Output.  AVk+p  —  Vk+pHk+p  —  fk+p^k+p  with  ^4+p^4+p  =  fa+p,  und  14+ pfk+p  =  O’ 


1 .  For  j  = 

1,2.  ..p 

2. 

fik+j 

II  fk+j 

—  1  || )  fik+j 

=  0  then 

stop; 

3. 

vk+j 

fk+j  — 

l@k+j>  Vk+j 

-  [Vjfc+i-i, 

i  vk+j\ 

1 

W  <— 

Avk+j , 

5. 

hk+j 

T 

£ 

1 

\w;  ak+J  v-  v[+iw  ; 
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6. 


Hk+j 


Rk+j  —  1  h*k+j 

@k+jek+j- 1  ak+j 


/■  fk-\-j  *  j^k+j  i 

If  k  =  0  then  Ki  =  rq  represents  the  initial  vector.  In  order  to  ensure  fk  ~  0  in  fi¬ 
nite  precision  arithmetic  the  above  algorithm  requires  some  form  of  re-orthogonalization 
at  step  7  [33]. 

In  exact  arithmetic  the  algorithm  continues  until  fk  —  0  for  some  k  <  n.  All 
of  the  intermediate  Hessenberg  matrices  Hj  are  unreduced  for  j  <  k.  A  Hessenberg 
matrix  is  said  to  be  unreduced  if  all  of  its  main  sub-diagonal  elements  are  nonzero. 
The  residual  vanishes  at  the  first  step  k  such  that  dimA4+i  (A,  iq)  =  k  and  hence  is 
guaranteed  to  vanish  for  some  k  <  n  .  The  following  result  indicates  when  an  exact 
truncated  factorization  occurs.  This  is  desirable  since  the  columns  of  14  form  a  basis 
for  an  invariant  subspace  and  the  eigenvalues  of  Hk  are  a  subset  of  those  of  A. 

Theorem  2.2  Let  (2.1 )  define  a  k-step  Arnoldi  factorization  of  A,  with  Hk  unreduced. 
Then  fk  =  0  if  and  only  if  v \  =  QkV  ivhere  AQk  =  QkRk  with  QkQk  —  h  and  Rk  is 
an  upper  triangular  matrix  of  order  k. 

Proof:  See  [33].  □ 

In  Theorem  2.2,  the  span  of  the  k  columns  of  Qk  represent  an  invariant  subspace  for 
A.  The  diagonal  elements  of  Rk  are  eigenvalues  of  A.  We  call  AQk  =  QkRk  a  partial 
Schur  decomposition  of  A.  In  particular,  if  the  initial  vector  is  a  linear  combination 
of  k  linearly  independent  eigenvectors  then  the  k- th  residual  vector  vanishes.  It  is 
therefore  desirable  to  to  devise  a  method  that  forces  the  starting  vector  V\  to  be  a 
linear  combination  of  eigenvectors  corresponding  to  the  wanted  eigenvalues. 

The  algorithms  of  this  paper  are  appropriate  when  the  order  of  A  is  so  large  that 
storage  and  computational  requirements  prohibit  completion  of  the  algorithm  that 
produces  Vn  and  Hn.  Working  in  finite  precision  arithmetic  generally  removes  the 
possibility  of  the  computed  residual  ever  vanishing  exactly. 

As  the  norm  of  fk  decreases,  the  eigenvalues  of  Hk  become  better  approximations 
to  those  of  A.  Experience  indicates  that  ||4||  rarely  becomes  small  let  alone  zero.  But 
as  the  order  of  Hk  increases  certain  eigenvalues  of  H  may  emerge  as  excellent  estimates 
to  eigenvalues  of  A.  Since  the  interest  is  in  a  small  subset  of  the  eigensystem  of  A, 
alternate  criteria  that  allow  termination  for  A;  <C  n  are  needed.  Let  HkV  =  yd  where 
||r/||  =  1.  Define  the  vector  x  =  Vy  to  be  a  Ritz  vector  and  the  scalar  9  to  be  Ritz 
value.  Then 


\\AVky  ~  VkHky\\  =  \\Ax  —  x6\\, 

(2-2)  =  IIMIIc^l, 

indicates  that  if  the  last  component  of  an  eigenvector  for  Hk  is  small  the  Ritz  pair  (x,9) 
is  an  approximation  to  an  eigenpair  of  A.  This  pair  is  exact  for  a  nearby  problem:  it 
is  easily  shown  that  (A  +  E)x  —  x6  with  E  —  -(e](y)fkXH.  The  advantage  of  using 
the  Ritz  estimate  (2.2)  is  to  avoid  explicit  formation  of  the  quantity  AVkV  —  Vky9  when 
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accessing  the  numerical  accuracy  of  an  approximate  eigenpair.  Recent  work  by  Chatelin 
and  Fraysee  [10,  11]  and  Godet-Thobie  [16]  suggests  that  when  A  is  highly  non-normal, 
the  size  of  e£y  is  not  an  appropriate  guide  for  detecting  convergence.  If  the  relative 
departure  from  normality  defined  by  the  Henrici  number  \\AAT  -  AtA||^/||A2||f,  is 
large,  the  matrix  A  is  considered  highly  non-normal.  Assuming  that  A  is  diagonalizable, 
a  large  Henrici  number  implies  that  the  basis  of  eigenvectors  is  ill-conditioned  [10]. 
Bennani  and  Braconnier  compare  the  use  of  the  Ritz  estimate  and  direct  residual 
|| Ax  —  x6\\  in  Arnoldi  algorithms  [6].  They  suggest  normalizing  the  Ritz  estimate 
by  the  norm  of  A  resulting  in  a  stopping  criteria  based  on  the  backward  error.  The 
backward  error  is  defined  as  the  smallest,  in  norm,  perturbation  A  A  such  that  the  Ritz 
pair  is  an  eigenpair  for  A  +  A  A.  Scott  [32]  presents  a  lucid  account  of  the  many  issues 
involved  in  determining  stopping  criteria  for  the  unsymmetric  problem. 


3  The  Implicitly  Restarted  Arnoldi  Iteration 


Theorem  2.2  motivates  the  selection  of  a  starting  vector  that  will  lead  to  the  construc¬ 
tion  of  an  approximate  basis  for  the  desired  invariant  subspace  of  A.  The  best  possible 
starting  vector  would  be  a  linear  combination  of  a  Schur-basis  for  the  desired  invariant 
subspace.  The  iRA-iteration  iteratively  restarts  the  Arnoldi  factorization  with  the  goal 
of  forcing  the  starting  vector  closer  and  closer  to  the  desired  invariant  subspace.  The 
scheme  is  called  implicit  because  the  updating  of  the  starting  vector  is  accomplished 
with  an  implicitly  shifted  QR  mechanism  on  Hk .  This  will  allows  us  to  update  the 
starting  vector  by  working  with  orthogonal  matrices  that  live  in  R,k><k  rather  than  in 

pnXn 

The  iteration  starts  by  extending  a  length  k  Arnoldi  factorization  by  p  steps.  Next, 
p  shifted  QR  steps  are  performed  on  Hk+P.  The  last  p  columns  of  the  factorization  are 
discarded  resulting  in  a  length  k  factorization.  The  iteration  is  defined  by  repeating  the 
above  process  until  convergence.  As  an  example,  suppose  that  p  —  1  and  k  represents 
the  dimension  of  the  desired  invariant  subspace.  Let  p  be  a  shift  and  let  Hk—pl  =  QkRk 
with  Qk  orthogonal  and  R k  upper  triangular  matrices,  respectively.  Then  from  (2.1) 


(3.1) 

(A  -  fiI)Vk  -  Vk(Hk  -  pi) 

=  /fcefc> 

(3.2) 

{A  —  pI)Vk  -  14 QkRk 

= 

(3.3) 

(A  -  pI)(VkQk )  -  ( VkQk)(RkQk ) 

=  fkekQk , 

(3.4) 

A{VkQk )  -  (VkQk)(RkQk  +  rI) 

—  fk^k  Qk- 

The  matrices  are  updated  via  14  VkQk  and  H k  <—  RkQk  +  pi  and  the  latter  ma¬ 
trix  remains  upper  Hessenberg.  However,  equation  (3.4)  is  not  a  legitimate  Arnoldi 
factorization.  Partitioning  the  matrices  in  this  equation  results  in 


(3.5) 


A[Vk~  \,vk] 


[Vk-i,vk] 


Rk—l  hk 

Rke\ek_\  ^ k 


+  fk^kQk- 


The  relation  (3.4)  fails  to  be  an  Arnoldi  factorization  since  the  matrix  fkekQk  has  a 
non-zero  ( k  -  l)-st  column.  Equating  the  first  k  —  1  columns  of  (3.5)  we  have 

(3-6)  AVk-i  =  Vfc-ii/fc_i  -I-  (f3kvk  +  (Tkfk)ek-i, 
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where  <7k  =  e\Qkek_x.  Performing  the  update  fk_x  fikvk  +  crkfk  and  noting  that 
V/c-i/fc-i  =  0  it  follows  that  equation  (3.6)  is  a  k  —  1  step  Arnoldi  factorization. 

We  now  show  that  the  IRA-iteration  is  equivalent  to  forming  the  leading  portion 
of  an  implicitly  shifted  QR-iteration.  Note  that  equations  (3.1)-(3.4)  are  valid  for 
1  <  k  <  n.  In  particular,  extending  the  factorization  of  equation  (3.1)  by  n  -  k 
steps  gives  fn  —  0  and  A  Vn  —  VnHn  =  0  defines  a  decomposition  of  A  into  upper 
Hessenberg  form.  Let  QnRn  =  Hn  -  pi  where  Qn  and  Rn  are  orthogonal  and  upper 
triangular  matrices  of  order  n,  respectively.  Since  Qk  and  Rk  are  the  leading  principles 
sub-matrices  of  order  k  for  Qn  and  Rn ,  respectively,  VnQnRnex  =  VkQkRkex  and 
ej Rnex  =  ex  Rke x  follow.  Post  multiplication  of  equation  (3.2)  with  ex  exposes  the 
relationship 


( A  -  pl)v i  =  VkQkexpn  =  VnQneipu  =  (A  -  pI)Vnex, 

where  pxx  =  ex  Rkex  and  vx  =  Vkex.  In  words,  the  first  column  of  the  updated  k  step 
factorization  matrix  is  the  same  as  the  first  column  of  the  orthogonal  matrix  obtained 
after  a  complete  QR  step  on  A  with  shift  p.  Thus,  the  IRA-iteration  may  be  viewed  as 
a  truncated  version  of  the  standard  implicitly  shifted  QR-iteration.  This  idea  may  be 
extended  for  up  to  p  >  1  shifts  [33],  One  cycle  of  the  iteration  is  pictured  in  Figures  1 — 
3.  Application  of  the  shifts  may  be  performed  implicitly  as  in  the  QR  algorithm.  If 
the  shifts  are  in  complex  conjugate  pairs  then  the  implicit  double  shift  can  be  used  to 
avoid  complex  arithmetic. 

Numerous  choices  are  possible  for  the  selection  of  the  p  shifts.  One  immediate 
choice  is  to  use  the  p  unwanted  eigenvalues  of  Hk+p.  In  exact  arithmetic,  the  last  p 
off  diagonal  elements  of  Hk+P  are  zero  and  the  Arnoldi  factorization  decouples.  The 
reader  is  referred  to  [33]  and  [9]  for  further  information. 

The  number  of  shifts  to  apply  at  each  cycle  of  the  above  iteration  is  problem  de¬ 
pendent.  At  present  there  is  no  a-priori  analysis  to  guide  the  selection  of  p  relative 
to  k.  I  he  only  formal  requirement  is  that  1  <  p  <  n  —  k.  However,  computational 
experience  indicates  that  p  >  k  is  preferable.  If  many  problems  of  the  same  type  are 
to  be  solved,  experimentation  with  p  for  a  fixed  k  should  be  undertaken.  This  usu¬ 
ally  decreases  the  required  number  matrix-vector  operations  but  increases  the  work 
and  storage  required  to  maintain  the  orthogonal  basis  vectors.  The  optimal  cross-over 
with  respect  to  CPU  time  varies  and  must  be  determined  empirically.  Further  research 
is  needed  to  understand  how  to  estimate  this  optimal  value  a-priori. 

Among  the  several  advantages  an  implicit  updating  scheme  possess  are: 

•  fixed  storage  requirements. 

•  r  he  ability  to  maintain  a  prescribed  level  of  orthogonality  for  the  columns  of  V 
since  k  is  of  modest  size. 

•  The  incorporation  of  the  well  understood  numerical  and  theoretical  behavior  of 
the  QR  algorithm. 

In  particular,  application  of  a  shift  may  result  in  one  of  the  sub-diagonal  elements  of  H 
becoming  small.  The  impact  of  the  deflation  strategies  associated  with  the  QR-iteration 
upon  the  IRA-iteration  are  addressed.  The  next  section  examines  what  deflation  is 
within  an  Arnoldi  factorization. 
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Figure  1:  The  set  of  rectangles  represents  the  matrix  equation  14 +PHk+P  +  /fc+Pe^+p  of 
an  Arnoldi  factorization.  The  unshaded  region  on  the  right  is  a  zero  matrix  of  k  +  p—l 
columns. 


Figure  2:  After  performing  p  implicitly  shifted  qr  steps  on  Hk+p,  the  middle  set  of 
pictures  illustrates  V* +PQQT Hk+PQ  +  fk+pel+PQ-  The  last  p+ 1  columns  of  fk+Pel+pQ 
are  non-zero  because  of  the  QR-iteration. 


+ 


Figure  3:  After  discarding  the  last  p  columns,  the  final  set  represents  VkHk  +  fk ^ 
a  length  k  Arnoldi  factorization. 
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4  Deflation  within  an  IRA-iteration 


As  the  iteration  progresses  the  Ritz  estimates  (2.2)  decrease  at  different  rates.  When  a 
Ritz  estimate  is  small  enough,  the  corresponding  Ritz  value  is  said  to  have  converged. 
The  converged  Ritz  value  may  be  wanted  or  unwanted.  In  either  case,  a  mechanism  to 
deflate  the  converged  Ritz  value  from  the  current  factorization  is  desired.  Depending 
on  whether  the  converged  Ritz  value  is  wanted  or  not,  it  is  useful  to  define  two  types  of 
deflation.  Before  we  do  this,  it  will  prove  helpful  to  illustrate  how  deflation  is  achieved. 
Suppose  that  after  m  steps  of  the  Arnoldi  algorithm  we  have 


(4.1) 


A[VuV2] 


[VUV2] 


H  i  G 

eejeJ  H2 


+  /eL 


where  V\  G  R?lXj ,  Hi  G  KjXj  for  1  <  j  <  m.  If  c  is  suitably  small  then  the  factorization 
decouples  in  the  sense  that  a  Ritz  pair  (y,@)  for  Hi  provides  an  approximate  eigen  pair 
(x  ~  with  a  Ritz  estimate  of  \eejy\.  Setting  e  to  zero  splits  a  nearby  problem 

exactly  and  setting  e  =  0  is  called  deflation.  If  e  is  suitably  small  then  all  the  eigenvalues 
of  Hi  may  be  regarded  as  converged  Ritz  values. 


4.1  Locking 

If  deflation  has  taken  place  and  all  of  the  deflated  Ritz  values  are  wanted  then  they  are 
considered  locked.  This  means  that  subsequent  implicit  restarting  is  done  on  the  basis 
V2-  The  sub-matrices  effected  during  implicit  restarting  are  G,  H2  and  V2.  However, 
during  the  phase  of  the  iteration  that  extends  the  Arnoldi  factorization  from  k  to  k-\-p 
steps  ,  all  of  the  columns  of  [Vi ,  V2]  participate  just  as  if  no  deflation  had  occurred.  This 
assures  that  all  of  the  new  Arnoldi  basis  vectors  are  orthogonalized  against  converged 
Ritz  vectors  and  prevents  the  introduction  of  spurious  eigenvalues  into  the  subsequent 
iteration.  Moreover,  this  provides  a  means  to  safely  compute  multiple  eigenvalues  when 
they  are  present.  A  block  method  is  not  required  if  deflation  and  locking  are  used.  The 
concept  of  locking  was  introduced  by  Jennings  and  Stewart  [37]  as  a  deflation  technique 
for  simultaneous  iteration. 


4.2  Purging 

If  deflation  has  occurred  but  some  of  the  deflated  Ritz  values  are  unwanted  then  an¬ 
other  mechanism,  purging,  must  be  introduced  to  remove  the  unwanted  Ritz  values 
and  corresponding  vectors  from  the  factorization.  In  exact  arithmetic  this  would  not 
be  necessary  because  the  implicit  shift  technique  would  accomplish  the  removal  of 
the  unwanted  Ritz  pair  from  the  leading  portion  of  the  iteration.  However,  in  finite 
precision  it  may  be  impossible  to  accomplish  the  removal  due  to  the  forward  instabil¬ 
ity  [28,  39]  of  the  QR  algorithm.  The  basic  idea  of  purging  is  perhaps  best  explained 
with  the  case  of  a  single  deflated  Ritz  value. 

Let  j  =  1  in  (4.1)  and  equate  the  first  columns  of  both  sides  to  obtain 

(4.2)  Avi  =  viai  +  eV2ei, 

where  tq  =  Vq e t  and  Hi  =  oq.  Equation  (4.2)  is  an  Arnoldi  factorization  of  length 
one.  The  Ritz  value  cq  has  Ritz  estimate  |c|. 
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Equating  the  last  m  —  1  columns  of  (4.1)  results  in 
(4-3)  AV2  =  VXG  +  V2-H2  +  1’ 

Suppose  that  ax  represents  an  unwanted  Ritz  value.  If  A  were  symmetric  then  G  =  eej 
and  equation  (4.3)  would  become 

(A  +  E)V 2  =  V2H2  +  fem-\  i 

where  E  =  -ev^V2ex)r  -  e(V2ex)vf .  Since  ||£||  =  e  equation  (4.3)  defines  a  length 
m  —  1  Arnoldi  factorization  for  a  nearby  problem.  The  unwanted  Ritz  pair  (tq,  aq)  may 
be  purged  from  the  factorization  simply  by  taking  V”  —  V2  and  H  —  H2  and  setting 
G  =  0  in  (4.3).  If  A  is  not  symmetric,  the  lx(m-l)  matrix  G  couples  vx  to  the  rest 
of  the  basis  vectors  V2.  This  vector  may  be  decoupled  using  the  standard  Sylvester 
equation  approach  [3,  17].  Purging  then  takes  place  as  in  the  symmetric  case.  However, 
the  new  set  of  basis  vectors  must  be  re-orthogonalized  in  order  to  return  to  an  Arnoldi 
factorization.  1  his  procedure  is  developed  in  §  5  and  §  6  including  the  case  of  purging 
several  vectors. 

4.3  Complications 

An  immediate  question  is:  Do  any  sub-diagonal  elements  in  the  Hessenberg  matrix 
ol  the  factorization  (4.1)  become  negligible  as  an  iRA-iteration  progresses  ?  Since  a 
cycle  of  the  Arnoldi  iteration  involves  performing  a  sequence  of  qr  steps,  the  question 
is  answered  by  considering  the  behavior  of  the  QR-iteration  upon  upper  Hessenberg 
matrices.  In  exact  arithmetic  under  the  assumption  that  the  Hessenberg  matrix  is 
unreduced,  only  the  last  sub-diagonal  element  may  become  zero  when  shifting.  But 
the  other  sub-diagonal  elements  may  become  arbitrarily  small. 

Computing  in  finite  precision  arithmetic,  however,  complicates  the  situation.  A 
robust  implementation  of  the  qr  algorithm  sets  a  sub-diagonal  element  to  zero  if  it  is 
in  magnitude  less  than  some  prescribed  threshold  and  this  technique  is  also  adopted 
for  deflation.  This  deflation  overcomes  the  technical  difficulty  associated  with  tiny 
sub-diagonals  and  improves  the  convergence  of  the  IRA-iteration.  However,  there  are 
further  difficulties. 

The  phenomena  of  the  forward  instability  of  the  tridiagonal  QR-iteration  [28]  is 
explored  by  Parlett  and  Le.  They  observe  that  while  the  implicitly  shifted  QR-iteration 
is  always  backward  stable,  there  are  cases  where  severe  forward  instability  can  occur. 
It  is  possible  for  a  single  QR-iteration  to  result  in  a  computed  Hessenberg  matrix  with 
entries  that  have  no  significant  digits  in  common  with  the  corresponding  entries  of 
the  Hessenberg  matrix  that  would  have  been  determined  in  exact  arithmetic.  The 
implication  is  that  the  computed  sub-diagonal  entries  may  not  be  reliable  indicators 
for  decoupling  the  Arnoldi  factorization.  Le  and  Parlett’s  analysis  implies  that  the 
Hessenberg  matrix  may  lose  significant  digits  when  the  shift  used  is  nearly  an  eigenvalue 
of  H,  and  the  last  component  of  the  normalized  eigenvector  is  small.  This  indicates 
that  it  may  be  impossible  to  filter  out  unwanted  eigenvalues  with  the  implicit  restarting 
technique  using  exact  shifts  and  this  is  the  motivation  for  developing  both  the  locking 
and  purging  techniques. 
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5  Deflating  Converged  Ritz  Values 

During  an  Arnoldi  iteration,  Ritz  values  may  converge  with  no  small  sub-diagonal 
elements  appearing  on  the  sub-diagonal  of  Hk ■  However,  when  a  Ritz  value  converges, 
it  is  always  possible  to  make  an  orthogonal  change  of  basis  in  which  the  appropriate 
sub-diagonal  of  II k  is  zero.  The  following  result  indicates  how  to  exploit  the  convergence 
information  available  in  the  last  row  of  the  eigenvector  matrix  for  Hk.  For  notational 
convenience,  all  subscripts  are  dropped  on  the  Arnoldi  matrices,  V,  H  and  /,  for  the 
remainder  of  this  section. 

Lemma  5.1  Let  Hy  =  yO  where  H  6  is  an  unreduced  upper  Hessenberg  matrix 

and  9  £  R  xuith  ||r/||  =  1  .  Let  W  be  a  Householder  matrix  such  that  Wy  =  e\T  where 
|r|  =  1.  Then 

(5-1)  elW  =  el  +  wT, 

where  IMI  ^  V^lejTyl  and 

(5-2)  WT  H  W  e\  =  ej0. 

Proof:  The  required  Householder  matrix  has  the  form 

W  =  I-i(y-  rei)(y  -  Te^)T, 
where  7  =  (1  +  |e^t/|)_1.  A  direct  computation  reveals  that 
(5-3)  e'j.  W  =  ej  +  w1 1 

where  wT  =  7 ely(rej  -  yT).  Estimating 

w  =  Trfe"'-™1"- 

establishes  the  bound  on  ||w||.  The  final  assertion  (5.2)  follows  from 

WT  HWe-y  =  T~lWTHy, 

=  T~1eWTy, 

=  T-'OWy, 

=  9e\ . 

□ 

Lemma  (5.1)  indicates  that  the  last  row  and  column  of  W  differ  from  the  last  row 
and  column  of  Lk  by  terms  of  order  \e%y\.  The  Ritz  estimate  (2.2)  will  indicate  when 
it  is  safe  to  deflate  the  corresponding  Ritz  value  9,  Rewriting  (2.1)  as 

AVW  =  VWWtHW  +  fekW, 


10 


and  using  both  (5.1)  and  (5.2)  and  partitioning  we  obtain 


(5.4) 


AVW  =  VW 


0  hT 
0  H 


+  M  +  fw1 


Equation  (5.4)  is  not  an  Arnoldi  factorization.  The  matrix  H  of  order  k  —  1  needs  to 
be  returned  to  upper  Hessenberg  form.  Care  must  be  taken  not  to  disturb  the  matrix 
fej  and  the  first  column  of  WT HW.  To  start  the  process  we  compute  a  Householder 
matrix  Y\  such  that 


,  6  9 

PkeI-2  7 


with  eJ_^Y\  =  e^_j.  The  above  idea  is  repeated  resulting  in  Householder  matrices 
Yi,y2, . . . ,  Yfc_3  that  return  H  to  upper  Hessenberg  form.  Defining 


[o  y,y2-  -n_3  J  ’ 

it  follows  by  the  construction  of  the  Yj  that  ejY  =  ej  and 
(5.5)  YTWTHWYex  =  0ex. 


The  process  of  computing  a  similarity  transformation  as  in  equation  (5.5)  is  not  new. 
Sections  20-  25,  chapter  9  of  [40]  discusses  the  more  general  notion  of  deflating  with 
invariant  subspaces.  Wilkinson  references  the  work  of  Feller  and  Forsythe  [15]  who 
appear  to  be  the  first  to  use  elementary  Householder  transformations  for  deflation. 
Problem  7.4.8  of  [17]  addresses  the  case  when  working  with  upper  Hessenberg  matrices. 
What  appears  to  be  new  is  the  application  to  the  Arnoldi  factorization  for  converged 
Ritz  values. 

Since 

||/tnTy||  =  ||/||||yTrn||  =  ||/||  ||u,||, 
the  size  of  H/w^H  remains  the  unchanged.  Making  the  updates 


V  <-  VWY, 

H  <-  YtWtHWY , 

rp  rp 

IV  <—  W  Y, 

we  obtain  the  relation 

(5.6)  AC  =  VH  +  fe[  +  fwT. 

A  deflated  Arnoldi  factorization  is  obtained  from  (5.6  )  by  discarding  the  term  fwT. 

The  following  theorem  shows  that  the  deflated  Arnoldi  factorization  resulting  from 
this  scheme  is  an  exact  k-step  factorization  of  a  nearby  matrix. 
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Theorem  5.2  Let  an  Arnoldi  factorization  of  length  k  be  given  by  (5.6)  where  Hy  =  y6 
and  \/2\e])y |  ||/||  <  e||A||  for  e  >  0.  Then  there  exists  a  matrix  E  £  RnXn  such  that 

(5.7)  ( A  +  E)V  =  VH  +  feTk , 

where 

m\  < 

Proof :  Subtract  f  wT  from  both  sides  of  equation  (5.6).  Set  E  =  —  f(Vw)T  and  then 

EV  =  -f(Vw)rV, 

=  ~/™T, 

and  equation  (5.7)  follows.  Using  Lemma  5.1 

11*11  =  H/ll  HI. 

=  ^\eTky\  ll/ll, 

<  <U\\- 


□ 

If  4  is  symmetric  then  the  choice  E  =  -f(Vw)T  -  ( Vw)f 1  results  in  a  symmetric 
perturbation.  If  e  is  on  the  order  of  unit  roundoff  then  the  deflation  scheme  introduces 
a  perturbation  of  the  same  order  to  those  already  present  from  computing  the  Arnoldi 
factorization  in  floating  point  arithmetic. 

Once  a  converged  Ritz  value  6  is  deflated,  the  Arnoldi  vector  corresponding  to  6  is 
locked  or  purged  as  described  in  the  previous  section.  The  only  difficulty  that  remains 
is  purging  when  A  is  nonsyminetric. 

If  A  is  not  symmetric  then  the  Ritz  pair  may  not  purged  immediately  because  of 
the  presence  of  h.  A  standard  reduction  of  H  to  block  diagonal  form  is  used.  If  6  is 
not  an  eigenvalue  of  //,  then  we  may  construct  a  vector  z  £  Rfc_1  so  that 


(5.8) 

Solving  the  linear  system 

(5.9) 

determines  z.  Define 


1 

E-i 

i _ 

'  1 

'  l 

'  9 

1  »  J 

4-i 

4-i 

H 

(HT  ~  ®Ik-\  )z  =  h. 


Z  = 


1 

4-i 


Post  multiplication  of  equation  (5.6)  by  Z  results  in 

6 


AVZ  =  UZ 


H 


+  fek  +  fw  Z, 
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since  eJ.Z  —  ejT.  Equating  the  last  k  —  1  columns  of  the  previous  expression  results  in 


(5.10) 

AV 

r  t  1 

Z1 

4-i 

•=  V 

zT 

4-i 

H  +  feTk_  1  +  fwT 

1 

•“-I 

1 _ 1 

Compute  the  factorization  (using  k  —  1  Givens  rotations) 

(5.11)  QR  =  f  1  , 


where  Q  £  wjj[1  qTq  _  /fc_,  an(j  ft  js  an  upper  triangular  matrix  of  order 

k  —  1.  Since  the  last  k  —  1  columns  of  Z  are  linearly  independent,  R  is  nonsingular. 
Post  multiplying  equation  (5.10)  by  R~l  gives 

(5.12)  AVQ  =  VQRHR-1  +p^_lfel_l  +  fwTQ, 


where  pk~\  =  e^^Rek-i-  The  last  term  fwTQ  in  (5.12)  is  discarded  by  the  deflation 
scheme  and  this  relation  shows  that  the  discarded  term  is  not  magnified  in  norm  by  the 
purging  procedure.  The  matrix  RH R~'  remains  upper  Hessenberg  since  R  is  upper 
triangular.  Partitioning  Q  conformally  with  the  right  side  of  equation  (5.11)  results  in 


and  it  follows  that  R.  1  =  Q 21.  A  tedious  derivation  also  shows  that  \pk-\\  >  1  and 
hence  the  Arnoldi  residual  is  not  amplified  by  the  purging.  The  final  purged  Arnoldi 
factorization  is 


(5.13)  AVQ  =  VQRHQ2,+p;ljel_v 

The  similarity  transformation  that  produces  the  new  upper  Hessenberg  matrix  also 
affects  the  eigenvectors  and  thus  the  Ritz  estimates.  If  Hu  =  uv  then  RH R~A Ru  = 
Ruu.  Normalizing  this  vector  to  have  unit  length  gives  the  new  Ritz  vector  as  y  = 
Ru/\\Ru\\.  The  new  Ritz  estimate  is  given  by 

\Pk-\(el-\V)\  ll/ll  =  (IPfc-ieLi^l/ll^ll)ll/ll> 

(5.14)  =  (|cLi«l/P«||)||/||. 


We  claim  that  this  estimate  is  the  same  as  the  Ritz  estimate  for  the  original  unde¬ 
flated  problem.  In  the  original  problem,  the  vector 


'  0  ’ 

T 

Z  U 

u 

u 

is  an  eigenvector  of  the  original  H .  The  norm  of  this  vector  is  1 juTu  +  ( zTu )2.  There¬ 
fore  the  original  Ritz  estimate  for  the  Ritz  value  v  is 


(5.15) 


leLi»l 

\JuTU  +  ( ZTU )2 


ll/ll- 
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However,  from  equation  (5.11)  , 


P«||  =  II  Wl  =  II 


=  \JuTU  +  ( ZTU )2 


and  the  two  Ritz  estimates  (5.15)  and  (5.14)  are  the  same  before  and  after  the  purging 
operation. 

Performing  the  set  of  updates 


V  -  VQ, 

H  -  RHQ2U 
f  -  Pk- 

defines  equation  (5.13)  as  an  Arnoldi  factorization  of  length  A;  —  1.  Theorem  5.2  implies 
this  is  an  Arnoldi  factorization  for  a  nearby  matrix.  It  is  easily  verified  that  VT  + 

wT)  =  0  and  that  H  is  an  upper  Hessenberg  matrix  of  order  k  —  1. 


6  A  Practical  Deflating  Procedure  for  the  Arnoldi  Fac¬ 
torization 

The  practical  issues  associated  with  a  numerically  stable  deflating  procedure  are  ad¬ 
dressed  in  this  section.  These  include: 

1.  Performing  the  deflation  in  real  arithmetic  when  a  converged  Ritz  value  has 
a  non-zero  imaginary  component. 

2.  Deflation  with  more  than  one  converged  Ritz  value. 

3.  Error  Analysis. 

Section  6.2  presents  two  algorithms  that  implement  the  deflation  schemes.  The  error 
analysis  of  the  two  deflation  schemes  is  presented  in  the  next  section. 


6.1  Deflation  with  Real  Arithmetic 


Suppose  H{y  -f  iz)  =  (0  +  ifi)(y  +  iz)  where  y  and  z  are  unit  vectors  in  Rfc,  H  E  Rkxk 
and  fi  ^  0.  Thus 


H[y,z\  =  [y,  z\ 


6  n 

— n  8 


[y,z]C. 


Factor 

(6.1) 


[y,  z]  = 


u 


T 

0 


where  UTU  =  Ik  and  T  is  an  upper  triangular  matrix.  It  is  easily  shown  that  y  and 
z  are  linearly  independent  as  vectors  in  R*  since  fi  /  0  and  the  nonsingularity  of  T 
follows.  Performing  a  similarity  transformation  with  U  on  [y,  z]  gives 


UT[x,y]U[e  !,e2] 


TCT-1 

0 


14 


Suppose  that  H  corresponds  to  an  Arnoldi  factorization  of  length  k  and  that  \ejy\  = 
0(e)  =  \ekz\-  order  to  deflate  the  complex  conjugate  pair  of  eigenvalues  from  the 
factorization  in  an  implicit  manner,  we  require  that  e£U  =  ej  +  uT  where  ||u||  =  0(e). 

We  now  show  that  the  magnitudes  of  the  last  components  of  y  and  z  are  not 
sufficient  to  guarantee  the  required  form  for  U.  Suppose  that  z  =  y  cos  <f>  r  sm  <f> 
where  r  is  a  unit  vector  orthogonal  to  y  and  (j>  measures  the  positive  angle  between  y 
and  z.  Lemma  5.1  allows  a  Householder  W  matrix  such  that 

WT[y,  z\  =  [r\e\ ,  T\e\  cos  <f>  +  WTr  sin  <j>]  = 


n  C 

0  z 


where  |ri|  =  1  and  the  last  column  and  row  of  W  and  Ik  are  order  ejy  equivalent.  To 
compute  the  required  orthogonal  factorization  in  equation  (6.1)  another  Householder 

_  —  V  / 


matrix  Q 


I  0 
0  Q 


is  needed  so  that  QTz  =  ±||z||ei.  But  Lemma  5.1  only  results 


in  eJ-iQ  ~  eI-\  +  <7T  with  ||<?||  =  0(e)  if  is  small  relative  to  ||£||.  Unfortunately, 
if  4>  is  small,  WT z  ~  T\e\  and  ||z||  ~  ef).  Hence  we  cannot  obtain  the  required  form  for 
U  =  WQ. 


Fortunately,  when  y  and  z  are  nearly  aligned,  /z  may  be  neglected  as  the  following 
result  demonstrates. 


Lemma  6.1  Let  H (y  +  iz)  =  (6  +  ifi)(y  +  iz)  where  y  and  z  are  unit  vectors  in  Rfc, 
H  6  R kxk  and  /z  ^  0.  Suppose  that  cf>  measures  the  positive  angle  between  y  and  z. 
Then 

(6.2)  |/z|  <  sin  <f>\\H  ||. 

Proof:  Let  z  =  z/cos<^)  +  r  sin  <f)  where  r  is  a  unit  vector  orthogonal  to  y  and 

measures  the  positive  angle  between  y  and  z.  Equating  real  and  imaginary  parts  of 
H(y  +  iz)  =  (6  +  ip)(y  +  iz)  results  in  Hy  -  yQ  -  z/z  and  Hz  =  yp  +  z0.  The  desired 
estimate  follows  since 


2/z  =  y1  Hz  —  zT Hy  =  sin  (f>(yT Hr  —  r  l  Hy), 
results  in  |/z|  <  sin</>||//||. 

□ 

For  small  <j>,  y  and  z  are  almost  parallel  eigenvectors  of  H  corresponding  to  a  nearly 
multiple  eigenvalue.  Numerically,  we  set  p  to  zero  and  deflate  one  copy  of  0  from  the 
Arnoldi  factorization. 

A  computable  bound  on  the  size  of  the  angle  is  now  determined  using  only  the 
real  and  imaginary  parts  of  the  eigenvector.  The  second  Householder  matrix  Q  should 
not  be  computed  if 

(6.3)  l«Li*l  >  lUllleJA 

Recall  that  Lemma  5.1  gives  e{W  =  +  wT  where  wT  —  -yejy^iej  -  yT)  and 

7  =  (1  +  \eJy\)~l-  Thus 

eLii  =  eTkWTz  =  eTkWz  =  eTkz  +  wT  z, 
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where  the  symmetry  of  W  is  used.  The  estimate 

Pll  =  !l[°>  zT]T\\  =  II  sin  <t>  =  sin  <f>, 

follows  since  W  is  orthogonal  and  r  is  a  unit  vector.  Rewriting  equation  (6.3),  we 
obtain 


(6.4) 


sin  4>  <  |  — 


eTz  +  wTz. 


=  11  + 


T 

4  2 

wTz 
T  I 

4  2 


=  |l  +  7  (^eJz-yTz) 


T 

T  I 

4 2 


as  our  computable  bound. 

Suppose  that  HX  =  XD  where  X  €  RfcXj  and  D  is  a  quasi-diagonal  matrix.  The 
eigenvalues  of  H  are  on  the  diagonal  of  D  if  they  have  zero  imaginary  component  and 
in  blocks  of  two  for  the  complex  conjugate  pairs.  The  columns  of  X  span  the  eigenspace 
corresponding  to  diagonal  values  of  D.  For  the  blocks  of  order  two  on  the  diagonal  the 
corresponding  complex  eigenvector  is  stored  in  two  consecutive  columns  of  X ,  the  first 
holding  the  real  part,  and  the  second  the  imaginary  part.  If  we  want  to  block  deflate 
X ,  where  the  last  row  is  small,  from  H ,  then  we  could  proceed  as  follows.  Compute 


the  orthogonal  factorization  X  =  Q 


R 

0 


via  Householder  reflectors  where  Q1  Q  =  Ik 


and  R  6  Rfcxfc  is  upper  triangular.  Then  the  last  row  and  column  of  Q  differ  from  that 
of  Ik  with  terms  on  the  same  order  of  the  entries  in  the  last  row  of  X  if  the  condition 
number  of  R  is  modest.  Thus  if  the  columns  of  X  are  not  almost  linearly  dependent, 
an  appropriate  Q  may  be  determined.  Finally,  we  note  that  when  H  is  a  symmetric 
tridiagonal  matrix,  an  appropriate  Q  may  always  be  determined. 


6.2  Algorithms  for  Deflating  Converged  Ritz  Values 

The  two  procedures  presented  in  this  section  extend  the  ideas  of  §  4  to  provide  deflation 
of  more  than  one  converged  Ritz  value  at  a  time.  The  first  purges  the  factorization 
of  the  unwanted  converged  Ritz  values.  The  second  locks  the  Arnoldi  vectors  corre¬ 
sponding  to  the  desired  converged  Ritz  values.  When  both  deflation  algorithms  are 
incorporated  within  an  iRA-iteration,  the  locked  vectors  form  a  basis  for  an  approxi¬ 
mate  invariant  subspace  of  A.  This  truncated  factorization  is  an  approximate  partial 
Schur  decomposition.  When  A  is  symmetric,  the  approximate  Schur  vectors  are  Ritz 
vectors  and  the  upper  quasi-triangular  matrix  is  the  diagonal  matrix  of  Ritz  values. 
Partition  a  length  m  Arnoldi  factorization  as 

(6.5)  !>„_,)  =  (V),  Vm_j)  ^ 

where  Hj  and  JIm-j  are  upper  quasi-triangular  and  unreduced  upper  Hessenberg  ma¬ 
trices,  respectively.  The  matrix  Hj  £  RjXj  contains  the  wanted  converged  Ritz  values 


+  fmel  +  fwT, 
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Figure  4:  The  matrix  product  VmHm  of  the  factorization  upon  entering  Algorithm  6.2 
or  6.3.  The  shaded  region  corresponds  to  the  converged  portion  of  the  factorization. 

of  the  matrix  Hm.  The  columns  of  Vj  6  RnXj  are  the  locked  Arnoldi  vectors  that  rep¬ 
resent  an  approximate  Schur  basis  for  the  invariant  subspace  of  interest.  The  matrix 
Hm~j  designates  the  trailing  sub-matrix  of  order  m  —  j.  Analogously,  the.  last  m  —  j 
columns  of  Vm  are  denoted  by  Vm-j.  We  shall  refer  to  the  last  m  -  j  columns  of  (6.5) 
as  the  active  part  of  the  factorization.  Finally,  Gj  £  R4Xm-J  denotes  the  sub-matrix 
in  the  north-east  corner  of  Hm.  Figure  4  illustrates  the  matrix  product  VmHm  of 
equation  (6.5). 

If  A  is  symmetric  the  two  deflation  procedures  simplify  considerably.  In  fact,  purg¬ 
ing  is  only  used  when  A  is  nonsymmetric  for  otherwise  Gj  =  0 jxm-j  and  both  Hj  and 
Hm-j  are  symmetric  tridiagonal  matrices.  Both  algorithms  are  followed  by  remarks 
concerning  some  of  the  specific  details. 

Algorithm  6.2 

function  \Vm ,  Hm  >  fm\  —  Lock  (F771,  fmi  Ai,  j) 

INPUT:  A  length  m  Arnoldi  factorization  A  Vrn  —  VmHm  +  .  The  first  j 

columns  of  Vm  represent  an  approximate  invariant  subspace  for  A.  The  leading  princi¬ 
pal  submatrix  Hj  of  order  j  of  Hm  is  upper  quasi-triangular  and  contains  the  converged 
Ritz  values  of  interest.  The  columns  of  Xi  6  RTO_J  Xl  are  the  eigenvectors  corresponding 
to  the  eigenvalues  that,  are  to  be  locked. 

OUTPUT:  A  length  m  Arnoldi  factorization  defined  by  Vm,  Hm  and  fm  where  the 
first  j  +  i  columns  of  Vm  are  an  approximate  invaraint  subspace  for  A. 
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1.  Compute  the  orthogonal  factorization 


Q 


Ri 

0 m—j—i 


xt 


where  Q  £  Rm  3  x  m  3  using  Householder  matrices  ; 

2.  Update  the  factorization 

H-m—j  QT H-m-jQ  Vm-j  Vm-jQ  i  Gj  <—  GjQ  / 

3.  Compute  an  orthogonal  matrix  P  £  Rm— j-ixm-j-i  using  Householder  matrices 

that  restores  to  upper  Hessenberg  form  ; 

4-  Update  the  factorization 

Hm-j—i  PT Hm-j-iP  /  Vm-j-t  Vm-j-iP  /  Gj+i  <—  Gj+iP  ; 


Line  1  computes  an  orthogonal  basis  for  the  eigenvectors  of  Hm-j  that  correspond 
to  the  Ritz  estimates  that  are  converged.  The  matrix  of  eigenvectors  in  line  1  satisfies 
the  equation  Hm_jXi  =  XiDt  where  Dt  is  a  quasi-diagonal  matrix  containing  the  eigen¬ 
values  to  be  locked.  From  the  §  6.1,  we  see  that  the  leading  sub-matrix  of  QT Hm-jQ 
of  order  i  is  upper  quasi-triangular.  The  required  relation  e^Q  —  +  qT ,  with  ||q|| 

small  is  guaranteed  if  the  condition  number  of  Ri  is  modest.  Since  i  is  typically  a  small 
number,  we  compute  the  condition  number  of  R%.  The  number  of  vectors  to  be  locked 
is  assumed  to  be  such  that  the  condition  number  of  Ri  is  small.  In  particular,  if  Hm  is 
a  symmetric  tridiagorial  matrix,  Q  always  has  the  required  form.  Lines  3-4  return  the 
updated  Hm-j  to  upper  Hessenberg  form. 

Before  entering  Purge,  the  unwanted  converged  Ritz  pairs  are  placed  at  the  front 
of  the  factorization.  A  prior  call  to  Lock  places  the  unwanted  values  and  vectors  to  the 
beginning  of  the  factorization.  Unlike  Lock,  the  procedure  Purge  requires  accessing 
and  updating  the  entire  factorization  in  the  nonsymmetric  case.  Thus,  for  large  scale 
nonsymmetric  eigenvalue  computations,  the  amount  purging  performed  should  be  kept 
to  a  minimum. 


Algorithm  6.3 

function  [Vm_;,  #m_i,  fm-i]  =  Purge  (Vm,  Hm ,  fm,j,  i) 

INPUT:  A  length  m  Arnoldi  factorization  AVm  -  VmHm  +  /me^.  The  first  i  +  j 
columns  ofVm  represent  an  approximate  invariant  subspace  for  A.  The  leading  principal 
submatrix  Hi+j  of  order  i+j  of  Hm  is  upper  quasi-triangular  and  contains  the  converged 
Ritz  values.  The  i  unwanted  converged  eigenvalues  are  in  the  leading  portion  of  Hl+j. 
The  converged  complex  conjugate  Ritz  pairs  are  stored  in  2x2  blocks  on  the  diagonal 
of  Hi+J . 

OUTPUT:  A  length  m  —  i  Arnoldi  factorization  defined  by  Vm-i,  Hm~i  and  fm-i 
purged  of  the  unwanted,  converged  Ritz  values  and  corresponding  Schur  vectors. 

Lines  1-3  purge  the  factorization  of  the  unwanted  converged  Ritz  values  contained 
in  the  leading  portion  of  Hm  ; 
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Vectors  to  be  Purged 


HI  Locked  Vectors 


□ 


Active  Factorization 


Figure  5:  The  matrix  product  VmHm  of  the  factorization  just  prior  to  discarding  in 
Algorithm  6.3.  The  darkly  shaded  regions  may  now  be  dropped  from  the  factorization. 


1.  Solve  the  Sylvester  set  of  equations. 


ZHm-i  -  HiZ  —  G{, 

for  Z  €  RlXm~!  that  arise  from  block  diagonalizing  Hrn  ; 


■  I 

z 

Ii  z 

Im—i 

*m—i 

Hm—i 

2.  Compute  the  orthogonal  factorization 


QRm-i  — 


Qr 

Qm—i 


Hm  —  i 


where  Q  £  R”1  x m  1  using  Householder  matrices  ; 

3.  Update  the  factorization  and  obtain  a  length  m  —  i  factorization  ; 
Hm—i  '  Rm—illm  —  iQm—i  t  Vm—i  *  VmQ  i  fm  —  i  <  i 

where  -  em_^Hm_jem_i  / 


At  the  completion  of  Algorithm  6.3  the  factorization  is  of  length  m  -  i  and  the 
leading  sub-matrix  of  order  j  will  be  upper  quasi-triangular.  The  wanted  converged 
Ritz  values  will  either  be  on  the  diagonal  if  real  or  in  blocks  of  two  for  the  complex 
conjugate  pairs.  Figure  5  shows  the  structure  of  the  updated  VmHm  just  prior  to 
discarding  the  unwanted  portions. 


The  solution  of  the  Sylvester  equation  at  line  1  determines  the  matrix  Z  that  block 
diagonalizes  the  spectrum  of  Hm  into  two  sub-matrices.  The  unwanted  portion  is  in 
the  leading  corner  and  the  remaining  eigenvalues  of  Hm  are  in  the  other  block.  A 
solution  Z  exists  when  the  Hi  and  Hm-i  do  not  have  a  common  eigenvalue.  If  there 
is  an  eigenvalue  is  shared  by  Hi  and  Hm-i ,  then  Hm  has  an  eigenvalue  of  multiplicity 
greater  than  one.  The  remedy  is  a  criterion  that  determines  whether  to  increase  or 
decrease  i,  the  number  of  Ritz  values  that  require  purging.  Analysis  similar  to  that  in 
section  5  demonstrates  that  after  line  3  the  Ritz  estimates  for  the  eigenvalues  of  Hm-i 

are  not  altered.  We  also  remark  that  Rm-i  is  nonsingular  since  the  matrix 

is  of  full  column  rank  and  that  Ip^-im-il  <  1. 


7  Error  Analysis 


This  section  examines  the  numerical  stability  of  the  two  deflation  algorithms  when 
computing  in  finite  precision  arithmetic.  A  stable  algorithm  computes  the  exact  solu¬ 
tion  of  a  nearby  problem.  It  will  be  shown  that  Algorithms  6.3  and  6.2  deflate  slightly 
perturbed  matrices. 


For  ease  of  notation  H  = 


replaces  Hm  £  RTOXm  USed  by  procedures 


H  n  #12 
H 21  H 22 

Lock  and  Purge  of  §  6.2.  The  sub-matrix  H\\  is  of  order  i  and  i/2 1  is  zero  except  for 
the  sub-diagonal  entry  of  H  located  in  the  north-east  corner.  Analogously,  H  repre¬ 
sents  H  after  the  similarity  transformation  performed  by  Lock  or  Purge,  partitioned 
conformably. 


7.1  Locking 

The  locking  scheme  is  considered  successful  if  the  desired  eigenvalues  end  up  in  Hn 
and  H2 i  is  small  in  norm.  The  largest  source  of  error  is  from  computing  an  orthogonal 
factorization  from  the  approximate  eigenvector  matrix  containing  the  vectors  to  be 
locked. 

The  matrix  pair  (Ar,  D)  represents  an  approximate  quasi-diagonal  form  for  H .  The 
computed  eigenvalues  of  H  are  on  the  diagonal  of  D  if  they  have  zero  imaginary  com¬ 
ponent  and  in  blocks  of  two  for  the  complex  conjugate  pairs.  The  computed  columns 
of  X  span  the  right  eigenspace  corresponding  to  diagonal  values  of  D.  For  the  blocks 
of  order  two  on  the  diagonal  the  corresponding  complex  eigenvector  is  stored  in  two 
consecutive  columns  of  A",  the  first  holding  the  real  part,  and  the  second  the  imaginary 
part.  We  assume  that  A'  is  a  non-singular  matrix  and  that  each  column  is  a  unit  vector. 

Standard  results  give  \\DX  —  H X\\  <  Ci  ||  A/ 1|  where  e\  is  a  small  multiple  of  machine 
precision  for  a  stable  algorithm.  Defining  the  matrix  E  —  ( DX  —  H X)YT  where 
A-1  =  YT  it  follows  that  (H  +  E) X  —  X D.  If  cr~1(A')  is  the  smallest  singular  value 
of  X  then  ||A_1||  =  a^(X).  Since  each  column  of  X  is  a  unit  vector,  ||X||  <  y/rn.  If 
k(A')  =  ||  AF  ||  ||  AT-1 1|  is  the  condition  number  for  the  matrix  of  approximate  eigenvectors, 
||#||  <  €i/c(Af )||//j|.  If  A'  is  a  well  conditioned  matrix  then  the  approximate  quasi¬ 
diagonal  form  for  H  is  exact  for  a  nearby  matrix.  In  particular,  if  H  is  symmetric  then 
E  is  always  a  small  perturbation.  As  the  columns  of  X  become  linearly  dependent, 
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Gm{X)  decreases  and  E  may  represent  a  large  perturbation. 

The  following  result  informs  us  that  locking  is  a  conditionally  stable  process. 


Theorem  7.1  Let  H  £  R” 


eigenvalues.  Suppose  that  X 


X\  Xi  and  D 


are  an  approximate 


be  an  unreduced  upper  Hessenberg  matrix  with  distinct 

Di  0 
0  D2 

L. 

quasi- diagonal  form  for  H  that  satisfies  ( H  +  E)X  =  XD  where  ||£j|  <  e1K(A')||f7||. 
Let  Q  i  R  i  =  A  i  £  Rmx-7  where  Q\Q  i  =  Ij.  If  the  QR  factorization  of  X\  is  computed 
using  Householder  reflectors  then  QR  =  X^  +  E  where  QTQ  =  Im  and  ||£||  <  e2 || - 
Both  e\  and  e2  are  small  multiples  of  the  machine  precision  cm-  Let  e  —  max( £1,262) 
and  let  k(R^)  =  ||f2i||||f?j'1||  be  the  condition  number  for  R]  where 


k(R\) 

1  -  €2k(#i)‘ 


If  0  =  e[n(X)  +  ep(l  -)-  epn^Ri)))  <  1  then  there  exists  a  matrix  C  £  RmXm  such 
that 

Qt(H  -C)Q  =  H=  f  H~ 12 1 , 

U  Ji  22 

where  Hu  is  an  upper  quasi-triangular  matrix  similar  to  D]  and 


(7.1) 


IICII  <  e(K(X)  +  ix)\\H\\  +  0(e2). 


A  few  remarks  are  in  order. 

1.  If  H  is  symmetric  //12  =  0  and  H\\  is  diagonal.  Procedure  Lock  is  stable  since 
noted  previously,  k(X)  =  1  and  p,  a  1. 

2.  If  only  one  column  is  locked,  then  p  =  1  +  0(e)  and  ||C||  is  small  relative  to 


3.  If  k{R\)  is  large,  the  columns  of  X\  are  nearly  dependent.  In  this  case,  k(X) 
will  also  be  large  and  locking  introduces  no  more  error  into  the  computation  than 
already  present  from  computing  the  quasi-diagonal  pair  (X,D).  The  factor  of  p 
may  be  minimized  by  decreasing  j  the  number  of  columns  locked. 

4.  A  conservative  strategy  locks  only  one  vector  at  a  time.  The  only  real  concern  is 
when  locking  two  vectors  corresponding  to  a  complex  conjugate  pair.  If  the  real 
and  imaginary  part  of  the  complex  eigenvector  are  nearly  aligned,  p  will  be  large 
and  locking  may  be  unstable.  But  as  §  6.1  explains,  the  complex  conjugate  pair 
may  be  numerically  regarded  as  a  double  eigenvalue  with  zero  imaginary  part. 
Only  one  copy  is  deflated  and  p  «  1. 


The  i  columns  of  Afi  are  a  basis 


Proof : 

Partition  X  =  \  X\  X2  }  and  D  =  ^  ^ 

L  J  0  U2 

for  the  right  eigenspace  to  be  locked  and  D\  contains  the  corresponding  eigenvalues. 

We  assume  that  the  eigenvalues  of  D\  and  D2  are  distinct  and  that  X  is  non-singular. 
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Let  Y1  = 


Y-. 7 

y? 


denote  the  inverse  of  X .  The  rows  of  span  the  left  eigenspace 


associated  with  X\  and  D\. 

Let  the  product  QR  be  an  exact  QR  factorization  of  a  matrix  near  X\\  QR  — 
"  Ri  " 


Q i  Qi 


o 


=  X\  +E  where  ||£||  <  62||Xi||.  Using  Theorem  1.1  ofStewart  [36], 


since  \\R1 1 1|||£||  <  9  <  1  there  exists  matrices  W\  G  RmXj  and  F\  G  R-7*-7  such 


that  (Q i  +  W\)(R\  +  F\)  =  Q\R\  where  QR  =  Qi  Q 


Ri 

o 


X\  and  (Qi  + 


WxfiQ^  +  WQ  =  Ij.  Define  F 


F\ 

0 


and  W 


W\  0  .  The  matrices  W  and  F 


are  the  perturbations  that  account  for  the  backward  error  E  produced  by  computation. 
Partitioning  W  conformably  with  Q  gives 

QtHQ  =  QTXDYTQ  -qteq, 

=  QT(XlDlYlT  +  X2D2Y J )Q  -  QtEQ , 

QT 


(7.2) 


QT 


{XxDxYj  +  X2D2Y2J)  Q,  Q2  + 


Wt(XiD,Y?  +  X2D2Y2t)  Qx  Q2  + 


QT 

QT 


(XxDxY?  +  X2D2Y2t)W  -  QtEQ , 


where  the  second  order  terms  involving  W  are  ignored.  From  the  decomposition  X\  = 
Q\R\  it  follows  that  Q\  —  XiR]*1  which  gives  Q\  X\  -  0.  The  equality  Yt  —  X-1 
implies  that  Y[T Xi  —  I  for  l  —  1,2  and  Yj X \  —  0  =  Y^  X 2  and  hence  Y2 Q\  =  0. 
Using  these  relationships,  equation  (7.2)  becomes 


(7.3) 

(7.4) 


qthq 


RxDxRx1  Q\XDYrQ2 
0  Q2  X2D2Y2  q2 


+  c, 


H  +  C, 


where  the  matrix  C  absorbs  the  three  matrix  products  involving  W  or  E  on  the  right 
hand  side  of  equation  (7.2).  We  note  that  if  H  is  symmetric,  QfX 2  —  0  =  Y^ Q 2, 
R\  is  a  diagonal  matrix  and  hence  R\D\Rj  =  D\.  Thus  H  is  also  a  symmetric 
matrix.  Defining  C  =  QCQT  equation  (7.4)  is  rewritten  as  QT(H  —  C)Q  =  H .  Since 
QH  =  (X\D\Y[r  +  X2D2Y2)Q  and  using  the  definition  of  C  from  equation  (7.2), 

(7.5)  C  =  WtQH  +  QtWH  -QtEQ, 

it  follows  that  ||C||  <  2||l/UrQ||||^||  +  ||jE||.  The  result  of  Theorem  1.1  of  Stewart  [36] 
also  allows  the  estimate 


\\wtq\\  <  \\w\\  <£2M1  +  w<Ri)), 

where  0(e3)  terms  are  ignored.  For  modest  values  of  /r,  W  is  numerically  orthogonal 
to  Q.  From  equation  (7.5) 

lie'll  =  IICII, 
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<  2c2/x(1  +  €2/iK(JR1))||^||  +  eiK(X)\\H\\, 

<  2e2/r(l  +  e2^K(R1))(\\H\\  +  ||C||)  +  e1K(X)\\H\\, 

<  £(*(*)  +  Ml  +  ^K(i?1)))||i/||  +  en(l  +  e/*«(JRi))||C'||, 

=  0p/|Ml|c||, 

where  the  second  inequality  uses  equation  (7.4).  Since  9  <  0,  rearranging  the  last 
inequality  gives  ||C||(l-0)  <  6\\H\\.  Ignorning  0(62)  terms  ||C||  <  0\\H\\.  The  estimate 
on  the  size  of  C  in  equation  (7.1)  now  follows  since  9  —  c(k(X)  +  /i(l  +  efin(X)))  < 
e{K(X )  +  n)  +  0(c2). 

□ 


7.2  Purging 

The  success  of  the  purging  scheme  depends  upon  the  solution  of  the  Sylvester  set  of 
equations  required  by  Algorithm  6.3.  We  rewrite  the  Sylvester  set  of  equations  in 
Algorithm  6.3  as  ZH22  —  R\\Z  =  H\2.  The  job  is  to  examine  the  effect  of  performing 
the  similarity  transformation  RH22R~l  where 


The  last  relation  implies  that  R~l  =  Q2  ■  In  actual  computation,  this  equality  obviates 
the  need  to  solve  linear  systems  with  R  necessary  for  the  similarity  transformation.  For 
the  error  analysis,  that  follows  R~ 1  is  used  in  a  formal  sense. 

Let  Z  be  the  computed  solution  to  the  Sylvester  set  of  equations.  In  a  similar 
analysis,  Bai  and  Demmel  [3]  assume  that  the  QR  factorization  of  S  is  performed 
exactly  and  we  do  also.  r  "he  major  source  of  error  is  that  arising  from  computing  Z. 

„  „  v 

Suppose  that  QR,  =  J  =  S.  Write  Z  —  Z  +  E  where  E  is  the  error  in  Z.  If 

QR  —  S  and  ||f7-1||||_E,||  <  1,  then  Theorem  1.1  of  Stewart  [36]  gives  matrices  W  and 
F  such  that  (Q  +  W)(R.  +  F)  =  QR  where  (Q  +  W)T(Q  +  W)  —  /m.  The  result  gives 
the  bound  ||P||  <  ||f2||||_E||  +  0(||f?||2).  Up  to  first  order  perturbation  terms, 

rh22r-1  =  (r  +  f)h22{r  +  f)~1  =  rh22r~1  +  rh22r-'fr~'  +  FH22R~ 1 . 

Defining  the  error  matrix  C  =  H22R~lF  +  R~lFH22  it  follows  that 

RH22R-x  =  R{H22  +  C)R~X. 


Ignoring  second  order  terms,  we  obtain  the  estimate 

||C||<2||JR-1||||JF||||/f22||<2K(5)||F;||||i722 


The  invariance  of  ||'||  under  orthogonal  transformations  gives  k(S)  =  || R  1||||JJ||.  Since 
the  singular  values  of  S  are  the  square  roots  of  the  eigenvalues  of  ST S  it  follows  that 


k(S) 


1  + 

l  +  (Z) 
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where  omax(Z )  and  crmax(Z )  are  the  largest  and  smallest  singular  values  of  Z.  Since 
ZT Z  is  a  symmetric  positive  semi-definite  matrix,  A max{ZT  Z)  =  \\Z\\ 2,  and  then 
k(S)  <  y/1  +  \\Z\\2,  with  equality  if  zero  is  an  eigenvalue  of  ZT Z. 

The  previous  discussion  is  summarized  in  the  following  result. 

Theorem  7.2  Let  Z  be  the  computed  solution  to  the  Sylvester  set  of  equations,  ZH22~ 
H\\Z  —  H\2,  where  the  eigenvalues  of  H\\  and  H22  are  distinct.  Let  Z  =  Z  +  E  where 

Z 

E  is  the  error  in  Z  and  suppose  that  ||f?,_1||||£||  <  1  where  QR  = 

Then  there  exists  a  matrix  C  such  that 


RH22R-1  =  r(h22  +  c)r-\ 


where 

(7-6)  HCII  <  2^1  +  ||^|P  H^ll  \\H\\. 

If  ||JS||  is  a  modest  multiple  of  machine  precision  and  the  solution  of  the  Sylvester’s 
equations  is  not  large  in  norm,  then  purging  is  backward  stable  since  ||C||  is  small 
relative  to  \\H  ||. 

The  two  standard  approaches  [5,  18]  for  solving  Sylvester’s  equation  show  that 
II^IIf  <  e3(||i/n||f  +  ||.//22||f)||2!||f  where  F  =  H]2  -  ZH22  +  H\\Z  and  e3  is  a 
modest  multiple  of  machine  precision.  Standard  bounds  [10,  17]  also  give  \\Z\\p  < 
sep~l(Hu,H22)\\Hi2\\F  where 


sep  (Hn,H22) 


min 

A'^o 


\\XH22-HnX\\F 

II^IIf 


is  the  separation  between  // n  and  H22.  Although 


sep(//n ,  H22)  <  min  |Afc(f/„)  -  A/(//22)|, 

kj 


Varah  [38]  indicates  that  if  the  matrices  involved  are  highly  non-normal,  the  smallest 
diflference  between  the  spectrums  of  H n  and  H22  may  be  an  over  estimate  of  the 
actual  separation.  Recently,  Higham  [21]  gives  a  detailed  error  analysis  for  the  solution 
of  Sylvester’s  equation.  The  analysis  takes  into  account  the  special  structure  of  the 
equations  involved.  For  example,  Higham  shows  that  ||£||f  <  sep~1(Hu,  H22)\\F\\f 
but  this  may  lead  to  an  arbitrarily  large  estimate  of  the  true  forward  error.  For  use  in 
practical  error  estimation,  “LAPACK-style”  software  is  available. 

A  robust  implementation  of  procedure  Lock  determines  the  backward  stability  by 
estimating  both  ||Z||  and  ||£’||. 


8  Other  Deflation  Techniques 

Saad  [31]  discusses  several  deflation  strategies  used  with  both  Arnoldi’s  method  and 
simultaneous  iteration.  Algorithm  6.2  is  an  in  place  version  of  one  of  these  schemes1. 
Saad’s  version  explicitly  orthonorm alizes  the  newly  converged  Ritz  vectors  against  the 

'Algorithm  6.4,  page  181  of  [31] 
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already  computed  approximate  j  Schur  vectors.  This  is  the  form  of  locking  used  by 
Scott  [32].  Instead,  procedure  Lock  achieves  the  same  task  implicitly  through  the  use 
of  Householder  matrices  in  RmXTO.  Thus  we  are  able  to  orthogonalize  vectors  in  Rn 
at  a  reduced  expense  since  m  <C  n.  As  Saad  notes,  the  Arnoldi  factorization  (6.5) 
is  equivalent  to  applying  Arnoldi’s  method  to  the  matrix  (/  —  VjVj'  )A  with  the  first 
column  of  Vm~3  as  the  starting  vector. 

Other  deflation  strategies  include  the  various  Wielandt  deflation  techniques  [31]. 
We  briefly  review  those  that  do  not  require  the  approximate  left  eigenvectors  of  A  or 
complex  arithmetic.  Denote  by  Aj, . . . ,  A*,  the  wanted  eigenvalues  of  A.  The  Wielandt 
and  Schur-Wielandt  forms  of  deflation  determine  a  rank  j  modification  of  A, 

(8.1)  Aj  =  A-UfoUf, 

where  E  j  G  RJXJ  is  a  diagonal  matrix  of  shifts.  The  value  of  j  represents  the  dimension 
of  the  approximate  invariant  subspace  already  computed.  The  idea  is  to  choose  shifts 
so  that  Aj  will  converge  to  the  remainder  of  the  invariant  subspace  desired. 

Both  forms  of  deflation  differ  in  the  choice  of  Uj.  The  Wielandt  variant  uses  con¬ 
verged  Ritz  vectors  while  the  Schur-Wielandt  uses  the  approximate  Schur  vectors. 
With  either  form  of  deflation,  the  eigenvalues  of  Aj  are  \l  —  ot  for  i  <  j  and  A,-  oth¬ 
erwise  and  both  forms  leave  the  Schur  vectors  unchanged.  Braconnier  [8]  employs  the 
Wielandt  variant  and  discusses  the  details  of  deflating  a  converged  Ritz  value  that  has 
nonzero  imaginary  part  in  real  arithmetic. 

The  cost  of  matrix  vector  products  with  Aj  increases  due  to  the  rank  j  modifications 
of  A  required.  Additionally,  every  time  an  approximate  Schur  vector  or  a  Ritz  vector 
converges,  the  iteration  needs  to  be  explicitly  restarted  with  Aj.  The  two  deflation 
techniques  introduced  in  this  paper  allow  the  iteration  to  be  implicitly  restarted — 
avoiding  the  need  to  build  a  new  factorization  from  scratch. 

The  idea  of  deflating  a  converged  Ritz  value  from  a  Lanczos  iteration  is  also  dis¬ 
cussed  by  Parlett  and  Nour-Omid  [29],  They  present  an  explicit  deflation  technique  by 
using  the  QR  algorithm  with  converged  Ritz  values  as  shifts.  Parlett  indicates  that  this 
was  a  primary  reason  for  undertaking  the  study  concerning  the  forward  instability  of 
the  qr  algorithm  [28]. 


9  Reordering  the  Schur  Form  of  a  Matrix 


We  now  establish  a  connection  between  the  iRA-iteration  with  locking  and  the  algo¬ 
rithms  used  to  re-order  the  Schur  form  of  a  matrix.  Suppose  a  matrix  A  is  reduced  to 
upper  quasi- triangular  form  by  the  QR  algorithm  : 


(9.1) 


QTAQ  =  T, 

_  -I'll  T\2 
T22 


where  Q  is  the  orthogonal  matrix  computed  by  the  algorithm.  Equation  (9.1)  is  a 
Schur  form  for  A  of  order  p  +  q  where  the  sub-matrices  T\\  and  T22  are  of  order  p  and 
q,  respectively.  Assume  that  the  spectrums  of  T\\  and  T22  are  distinct.  In  practice,  the 
order  in  which  the  computed  eigenvalues  of  A  appear  on  the  diagonal  of  T  is  somewhat 
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random.  The  first  p  columns  of  Q  are  an  orthogonal  basis  for  the  unique  invariant 
subspace  corresponding  to  the  eigenvalues  of  T\\.  If  eigenvalues  of  interest  are  located 
in  T2 2  and  an  orthonormal  basis  for  them  is  wanted  then  we  must  either  increase  the 
number  of  columns  of  Q  used  or  somehow  place  them  at  the  top  of  T.  Algorithms  for  re¬ 
ordering  a  Schur  form  accomplish  this  task  by  using  orthogonal  matrices  that  move  the 
wanted  eigenvalues  to  the  top  of  T.  The  recent  work  of  Bai  and  Demmel  [3]  attempts 
to  correct  the  occasional  numerical  problems  encountered  by  Stewart’s  algorithm  [35] 
EXCHNG.  Both  algorithms  swap  consecutive  lxl  and  2x2  blocks  of  a  quasi-triangular 
matrix  to  attain  the  desired  ordering. 

Let  both  Tii  and  T22  of  equation  (9.1)  be  matrices  of  at  most  order  two.  When 
swapping  adjacent  blocks  of  order  one,  p  =  1  =  q,  EXCHNG  constructs  a  plane  rotation 
that  zeros  the  second  component  of  the  eigenvector  corresponding  to  the  eigenvalue 
A2  =  T22.  A  similarity  transformation  is  performed  on  T  with  the  plane  rotation  and 
the  diagonal  blocks  are  interchanged.  We  refer  to  a  strategy  that  constructs  an  orthog¬ 
onal  matrix  and  performs  a  similarity  transformation  to  interchange  the  eigenvalues 
as  a  direct  swapping  algorithm.  Consider  the  following  alternate  iterative  swapping 
algorithm:  Perform  a  similarity  transformation  on  T  with  an  arbitrary  orthogonal  ma¬ 
trix  followed  by  one  step  of  the  QR-iteration  with  shift  equal  to  A2.  The  arbitrary 
orthogonal  similarity  transformation  introduces  a  non-zero  off-diagonal  element  in  the 
2, 1  entry  so  that  the  transformed  T  is  an  unreduced  upper  Hessenberg  matrix  with  the 
diagonal  blocks  now  coupled.  The  standard  convergence  theory  of  the  qr  algorithm 
dictates  that  Aj  and  A2  are  switched  and  the  2, 1  entry  is  zero.  If  the  order  of  T22  is 
equal  to  two,  EXCHNG  uses  the  iterative  swapping  strategy  using  a  standard  double 
shift  to  re-order  the  diagonal  blocks.  The  direct  swapping  algorithm,  instead,  computes 
an  appropriate  orthogonal  matrix  by  computing  the  QR  factorization  of  a  basis  of  two 
vectors  that  span  the  desired  invariant  subspace.  For  example  the  factorization  used  in 
equation  (6.1)  in  §  6.1  may  be  used.  The  reader  is  referred  to  [3,  14]  for  further  details. 

The  iterative  swapping  algorithm  is  equivalent  to  the  implicit  restarting  technique 
used  by  the  iRA-iteration  since  both  depend  upon  an  implicitly  shifted  qr  step  applied 
to  an  unreduced  upper  Hessenberg  matrix  to  interchange  Tn  and  T22.  The  direct 
swapping  algorithm  is  equivalent  to  the  locking  technique.  An  orthogonal  matrix  is 
constructed  from  a.  basis  for  the  invariant  subspace  corresponding  to  T22.  When  this  is 
applied  as  a  similarity  transformation  the  diagonal  blocks  of  T  are  swapped.  In  exact 
arithmetic,  both  swapping  variants  result  in  a  matrix  that  is  upper  quasi-triangular 
with  the  blocks  interchanged. 

The  following  example  demonstrates  that  the  two  variants  may  produce  drastically 
different  output  matrices  when  computed  in  floating  point  arithmetic.  The  following 
experiment  was  carried  out  in  MATLAB,  Version  4.2a,  on  a  SUN  SPARC  station  IPX. 
The  floating  point  arithmetic  is  IEEE  standard  double  precision  with  machine  precision 
of  eM  =  2-52  a  2.2204  •  10“16.  Let 


T  = 


1  +  10  cm  1 
0  1 


An  eigenvector  corresponding  to  A2  =  1  is 
that  transforms  this  eigenvector  to  a  multip 


-1 
10  cm 


.  Denote  by  Z  the  plane  rotation 


le  of  the  first  column  of  the  identity  matrix 
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in  R2x2.  Let 


U  = 


1  -5  €m 

10  €m  1 


so  that  U  is  orthogonal  to  a  small  multiple  of  machine  precision.  The  matrix  U  acts 
as  the  arbitrary  orthogonal  transformation  required  by  the  iterative  algorithm.  Let  T 
denote  the  matrix  computed  by  performing  one  step  of  the  explicit  QR-iteration  to  the 
matrix  UtTU  with  shift  equal  to  Ai  =  1  T  10cm-  The  two  computed  matrices  are: 


ZtTZ 

T 


1  -1 

0  1  +  l(kM  J  ’ 

1.400000000000003  -7.999999999999996- 10-1 

2.000000000000002 • 10"1  6.000000000000001 • 10-1 


The  computed  eigenvalues  of  Z  are  1.000000033320011  and  9.999999666799921  •  10" 
which  both  lost  eight  digits  of  accuracy.  If  we  perform  another  explicit  QR-step  on 

”  1.000000000000003  1.000000000000001  " 

«  1.09  •  10-15  1  I  IS  com¬ 


the  matrix  T  with  the  same  shift, 


puted.  Note  that  the  off-diagonal  element  is  slightly  larger  than  machine  precision  so 
that  a  standard  QR  algorithm  does  not  set  it  to  zero.  But  even  if  the  off-diagonal 
element  is  set  to  zero,  the  iterative  swapping  algorithm  fails  to  interchange  the  eigen¬ 
values.  Continuing  to  apply  explicit  QR-steps  with  the  shift  equal  to  Aj  does  not  result 
in  a  properly  interchanged  matrix. 

The  explanation  why  the  iterative  algorithm  fails  to  work  is  simple  enough.  The 
matrix  T  constructed  is  poorly  conditioned  with  respect  to  the  eigenvalue  problem  since 
the  eigenvectors  are  nearly  aligned.  The  eigenvalues  of  UtTU  are  1.000000033320011 
and  9.999999666799921  •  10-1.  Thus  the  small  relative  errors  on  the  order  of  machine 
precision  that  occur  when  computing  UtTU  produce  a  nearby  matrix  in  which  both 
the  eigenvalues  differ  by  eight  digits  of  accuracy.  Performing  an  explicitly  shifted  qr 
step  with  Aj  incurs  forward  instability  since  the  last  components  of  the  eigenvectors 
for  UtTU  are  on  the  order  of  ^/f-M  ■  This  is  the  necessary  and  sufficient  condition  of 
Parlett  and  Le  [28].  Another  qr  step  with  the  same  shift  on  T  almost  zeros  out  the 
sub-diagonal  element  since  the  last  components  of  the  eigenvectors  for  t  are  order  10-1 
and  the  shift  is  almost  the  average  of  the  eigenvalues  of  T  and  quite  close  to  both. 

Bai  and  Demmel  [3]  present  an  example  which  compares  their  direct  swapping 
approach  with  Stewart’s  algorithm  EXCHNG.  The  matrix  considered  is 


7.001  -87  39. 4t  22.2r 

,,  s  5  7.001  —12. 2r  36. Or 

A(t)  — 

v  0  0  7.01  -11.7567  ' 

0  0  37  7.01 

When  r  =  10,  ten  iterations  QR-iterations  are  required  to  interchange  the  two  blocks. 
As  before,  the  eigenvalues  undergo  a  loss  of  accuracy.  The  iterative  swapping  algorithm 
fails  for  the  matrix  A(100).  The  explanation  for  the  failure  is  the  same  as  for  the 
previous  example.  Using  a  direct  algorithm,  the  eigenvalues  of  A(10)  and  A(100)  are 
correctly  swapped  and  the  eigenvalues  lose  only  a  tiny  amount  of  accuracy. 
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1.  Initialize  an  Arnoldi  factorization  of  length  k 

2.  Main  Loop 

3.  Extend  an  Arnoldi  factorization  to  length  k  +  p 

4.  Check  for  convergence 

Exit  if  k  wanted  Ritz  values  converge 

Let  i  and  j  denote  the  wanted  and  unwanted  converged 

Ritz  values,  respectively 

5.  Lock  the  i  +  j  converged  Ritz  values 

6.  Implicit  application  of  shifts  resulting  in  an 
Arnoldi  factorization  of  length  k  +  j 

7.  Purge  the  j  unwanted  converged  Ritz  values. 


Table  1:  Formal  description  of  an  iRA-iteration 


Bai  and  Demmel  presents  a  rigorous  analysis  of  their  direct  swapping  algorithm. 
Although  backward  stability  is  not  guaranteed,  it  appears  that  only  when  both  Tn  and 
T22  are  both  of  order  two  and  have  almost  indistinguishable  eigenvalues  [7]  is  stability 
lost.  In  this  case,  the  interchange  is  not  performed.  Bojanczyk  and  Van  Dooren  [7] 
present  an  alternate  swapping  algorithm  that  appears  to  be  backward  stable. 

10  Numerical  Results 

An  IRA-iteration  using  the  two  deflation  procedures  of  section  6.2  was  written  in 
MATLAB,  Version  4.2a.  An  informal  description  given  parameters  k  and  p  is  given  in 
Table  1.  The  codes  are  available  from  the  first  author  upon  request.  A  high-quality  and 
robust  implementation  of  the  deflation  procedures  is  planned  for  the  Fortran  software 
package  ARPACK  [25]. 

In  the  examples  that  follow  Q ^  and  Rk  denote  the  approximate  Schur  factors  for  an 
invariant  subspace  of  order  k  computed  by  an  IRA-iteration.  All  the  experiments  used 
the  starting  vector  equal  to  randn(n,  1)  where  the  seed  is  set  with  randn(  ’  seed’ ,  0) 
and  n  is  the  order  of  the  matrix.  The  shifting  strategy  uses  the  unwanted  eigenvalues 
of  Hk+p  that  have  not  converged.  An  eigenpair  (6,  y)  of  H^+p  is  accepted  if  its  Ritz 
estimate  (2.2)  satisfies, 

l10-1)  \ek+Py\  \\fk+p\\  <  r]\e\. 

The  value  of  77  is  chosen  according  to  the  relative  accuracy  of  the  Ritz  value  desired. 

10.1  Example  1 

The  first  example  illustrates  the  use  of  the  deflation  techniques  when  the  underlying 
matrix  has  several  complex  repeated  eigenvalues.  The  example  also  demonstrates  how 
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IRA-iteration  for  C450 

k  =  12  and 

p  =  16  with  convergence  tolerance 

is  p  =  10  10 

Iteration 

Ritz  values  Locked  Ritz  values  Purged 

9 

2 

0 

10 

2 

0 

12 

2 

0 

13 

2 

0 

17 

2 

0 

21 

0 

2 

24 

2 

0 

28 

0 

2 

31 

2 

0 

Totals 

14 

4 

Number  of  matrix  vector  products 

436 

||C,450<3 12  -  Q12-R12II  ~  10  12 

\\Q  12^ 45oQ \  2  ~ 

R12W « 10-11 

1  QiiQ  12  - 1 

12II  *  10-14 

\\D\2  -  A12I 

00  *  10-15 

Table  2:  Convergence  history  for  Example  one 

the  iteration  locks  and  purges  blocks  of  Ritz  values  in  real  arithmetic.  A  block  diagonal 
matrix  C  was  generated  having  n  blocks  of  order  two.  Each  block  was  of  the  form 

&  Vi 
.  ~r]l  &  ’ 

where 


4  sin2( 


m 


2(n  +  1) 


)  +  4sin2( 


JTf 


2  {n  +  1) 


). 


for  1  <  i,j  <  n  and  rji  =  y/fj.  The  eigenvalues  of  C  are  £/  ±  rpi  where  i  =  a/~1-  Since 
the  eigenvalues  of  a  quasi-diagonal  matrix  are  invariant  under  orthogonal  similarity 
transformations,  using  an  iRA-iteration  on  C  with  a  randomly  generated  starting  vector 
is  general.  An  IRA-iteration  was  used  to  compute  the  k  =  12  eigenvalues  of  C450  with 
smallest  real  part.  The  number  of  shifts  used  was  p  =  16  and  the  convergence  tolerance 
V  was  set  equal  to  10-10.  With  these  choices  of  k  and  p,  the  iteration  stores  at  most 
twenty  eight  Arnoldi  vectors. 

There  are  four  eigenvalues  with  multiplicity  two.  Table  2  shows  the  results  attained. 
Let  the  diagonal  matrix  D\2  denote  the  eigenvalues  of  the  upper  triangular  matrix  R\2 
computed  by  the  iteration.  The  diagonal  matrix  A12  contains  the  wanted  eigenvalues. 
After  twenty  four  iterations  twelve  Ritz  values  converged.  But  the  pair  of  Ritz  val¬ 
ues  purged  at  iteration  twenty  one  was  a  previously  locked  value  which  the  iteration 
discarded.  This  behavior  is  typical  when  there  are  clusters  of  eigenvalues. 
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10.2  Example  2 

Consider  the  eigenvalue  problem  for  the  convection-diffusion  operator, 

-A  u(x,y)  + p(ux(x,y)+ uy(x,y))  =  A  u(x,y), 

on  the  unit  square  [0, 1]  x  [0, 1]  with  zero  boundary  data.  Using  a  standard  five-point 
scheme  with  centered  finite  differences,  the  matrix  Ln2  that  arises  from  the  discretiza¬ 
tion  is  of  order  n 2  where  h  —  1  /(n  +  1)  is  the  cell  size.  The  eigenvalues  of  Ln 2  are 

\i3  =  2 i/l  -  7cos(-^— )  +  2\/l  -  7cos(^— ), 

for  1  <  i,j  <  n  where  7  =  ph/ 2.  An  iRA-iteration  was  used  to  compute  the  k  =  6 
smallest  eigenvalues  of  where  p  =  25.  The  number  of  shifts  used  was  p  =  10  and 
the  convergence  tolerance  rj  was  set  equal  to  10-8.  With  these  choices  of  k  and  p,  the 
iteration  stores  at  most  sixteen  Lanczos  vectors.  Let  the  diagonal  matrix  De  denote  the 
eigenvalues  of  the  upper  triangular  matrix  R6  computed  by  the  iteration.  The  diagonal 
matrix  A&  €  Rbx6  contains  the  six  smallest  eigenvalues.  We  note  that  there  are  two 
eigenvalues  with  multiplicity  two.  Table  3  shows  the  results  attained.  The  diagonal 
matrix  Dq  approximates  A6.  After  thirty  iterations  six  Ritz  values  converged.  But  the 
Ritz  value  purged  at  iteration  twenty  four  was  a  previously  locked  value.  The  other 
purged  Ritz  values  are  approximations  to  the  eigenvalues  of  +625  larger  than  Xq. 

Figure  6  gives  a  graphical  interpretation  of  the  expense  of  an  IRA-iteration  in  terms 
of  matrix  vector  products  when  the  value  of  p  is  increased.  For  all  values  of  p  shown, 
the  results  of  the  iteration  were  similar  to  those  of  Table  3.  The  results  presented 
in  Table  3  correspond  to  the  value  of  p  that  gave  the  minimum  number  matrix  vector 
products.  For  the  value  of  p  =  1,  the  iteration  converged  to  the  five  smallest  eigenvalues 
after  nine  hundred  ninety  nine  matrix  vector  products.  But  the  iteration  was  not  able 
to  converge  to  the  second  copy  of  A5.  For  p  =  2,  the  only  form  of  deflation  employed 
was  locking.  All  others  values  of  p  shown  demonstrated  similar  behavior  to  that  of 
Table  3. 

In  order  to  determine  the  benefit  of  the  two  deflation  techniques,  experiments  were 
repeated  without  the  use  of  locking  or  purging.  In  addition,  all  the  unwanted  Ritz 
values  were  used  as  shifts,  converged  or  not.  The  first  run  used  the  same  parameters 
as  given  in  Table  3.  After  210  matrix  vector  products,  the  iteration  converged  to  six 
Ritz  values.  But  the  second  copy  of  the  fifth  smallest  eigenvalue  was  not  among  the 
final  six.  The  value  of  p  was  increased  to  twenty  three  with  the  same  results. 


10.3  Example  3 


The  following  example  shows  the  behavior  of  the  iteration  on  a  matrix  with  a  very  ill 
conditioned  basis  of  eigenvectors.  Define  the  Clement  tridiagonal  matrix  [22]  of  order 
n+l 


Bn+ 1 


On  •••  0 

1  0  n  -  1 

0  n  0 
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IRA-iteration  on  Le 25 

k  =  6  and  p 

=  10  with  convergence  tolerance 

is  77  =  10  8 

Iteration 

Ritz  values  Locked  Ritz  values  Purged 

14 

1 

0 

16 

1 

0 

19 

1 

0 

21 

1 

0 

23 

1 

1 

24 

0 

1 

30 

1 

0 

35 

0 

1 

38 

1 

1 

Totals 

7 

4 

Number  of  matrix  vector  products 

325 

||T625<36  —  QsR-e>\\  ~  10  9 

||Q<3  ^625<36  _  -#611  ~  10-9 

WQlQe  -  /ell  «  10-14 

IPe  -  Aelloo  ~  10-7 

Table  3:  Convergence  history  for  Example  two 


Figure  6:  Bar  graph  of  the  number  of  matrix  vector  products  used  by  an  iRA-iteration 
for  Example  2  as  a  function  of  p. 
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IRA-iteration  on  i?moo 

k  —  4  and 

p  —  16  with  convergence  tolerance  is  rj  =  10  6 

Iteration 

Ritz  values  Locked  Ritz  values  Purged 

76 

1 

0 

85 

1 

0 

91 

2 

0 

Totals 

4 

0 

Number  of  matrix  vector  products 

1423 

\\B\qqqQ4  ~  Q4R4H/H-B1OOO 

|  »  10"6 

IIQ4  ^1000<94  —  -fiUll  ~  10  6 

HQ4Q4  -/4||  *  10“14 

\\D4  -  A4 1 1 oo / 1 1 B\ 000 1 1 00  ~  10~6 

Table  4:  Convergence  history  for  Example  three 

The  eigenvalues  are  ±n,±n  —  2,  •••,±1  and  zero  if  n  is  even.  We  note  that  Bn+i  = 
■S’n+i-^n+i'S'n+i  where  5^+1  =  diag(l,  f,  ■  •  • ,  ^f)  is  a  diagonal  matrix.  Thus  the 

condition  number  of  the  basis  of  eigenvectors  for  Bn+1  is  ||Sn+1||  ||5~+1||  which  implies 
that  the  eigenvalue  problem  for  Bn+1  is  quite  ill  conditioned.  An  IRA-iteration  was 
used  to  compute  the  k  =  4  largest  in  magnitude  eigenvalues  of  i?moo-  The  number  of 
shifts  used  was  p  —  16  and  the  convergence  tolerance  r/  was  set  equal  to  10~6.  With 
these  choices  of  k  and  p ,  the  iteration  stores  at  most  twenty  Arnoldi  vectors.  Let  the 
diagonal  matrix  D4  denote  the  eigenvalues  of  the  upper  triangular  matrix  R4  computed 
by  the  iteration.  The  diagonal  matrix  A4  €  R4x4  contains  the  four  largest  in  magnitude 
eigenvalues.  Table  4  shows  the  results  attained. 

Although  the  iteration  needed  a  large  number  of  matrix  vector  products,  the  iter¬ 
ation  was  able  to  extract  accurate  Ritz  values  given  the  convergence  tolerance. 

10.4  Example  4 

Finally,  we  present  a  dramatic  example  of  how  the  convergence  of  an  IRA-iteration 
benefits  from  the  two  deflation  procedures.  A  matrix  T  of  order  ten  had  the  values 

rl  =  10  65  74=2:8  =  *  ■  10  3,Tg:io=l, 

on  the  diagonal.  Since  the  eigenvalues  of  a  matrix  are  invariant  under  orthogonal 
similarity  transformations,  using  an  IRA-iteration  on  T  with  a  randomly  generated 
starting  vector  is  general.  An  IRA-iteration  was  used  to  compute  an  approximation  to 
the  smallest  eigenvalue.  The  number  of  shifts  used  was  p  —  3  and  the  convergence 
tolerance  7;  was  set  equal  to  10~3.  Table  5  shows  the  results  attained. 

Another  experiment  was  run  with  the  locking  and  purging  mechanisms  turned  off. 
Additionally,  all  unwanted  Ritz  values  were  used  as  shifts.  The  same  parameters  were 
used  as  in  Table  5  but  the  iteration  now  consumed  forty  one  matrix  vector  products. 
As  in  the  results  for  Table  5,  the  modified  iteration  converged  to  one  of  the  dominant 
eigenvalues  after  one  iteration.  After  six  iterations,  the  leading  block  of  H4  split  off, 
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IRA-iteration  on  T 

k  —  1  and  p  —  3  with  convergence 

tolerance  is  rj  =  10  3 

Iteration  Ritz  values  Locked 

Ritz  values  Purged 

1  0 

1 

15  1 

1 

Totals  1 

2 

Number  of  matrix  vector  products  32 

\\TQ\  -  Q\R\W/tx 

«  1(T3 

WQ'iTQi  -  RiW/n 

«  10~3 

WQiQi  -  /ill  «  io~15 

\\Ri  ~  riHoo/rj  « 

10"3 

Table  5:  Convergence  history  for  Example  four 


having  converged  to  the  invariant  subspace  corresponding  to  r9:i0.  But  since  purging 
was  turned  off,  the  modified  iteration  had  to  continue  attempting  to  converge  to  t\ 
using  only  the  lower  block  of  order  two  in  H 4.  Incidently,  if  the  iteration  instead 
simply  discarded  the  leading  portion  of  the  factorization  corresponding  to  Tq:\q  after 
the  sixth  iteration,  convergence  to  T\  never  occurred.  Crucial  to  the  success  of  an 
iRA-iteration  is  the  ability  to  deflate  converged  Ritz  values  in  a  stable  manner.  Both 
purging  and  locking  allow  faster  convergence. 

11  Conclusions 

In  the  paper,  we  developed  deflation  techniques  for  an  implicitly  restarted  Arnoldi  iter¬ 
ation.  The  first  technique,  Locking,  allows  an  orthogonal  change  of  basis  for  an  Arnoldi 
factorization  which  results  in  a  partial  Schur  decomposition  containing  the  converged 
Ritz  values.  The  corresponding  Ritz  value  is  deflated  in  an  implicit  but  direct  man¬ 
ner.  The  second  technique,  Purging,  allows  implicit  removal  of  unwanted  converged 
Ritz  values  from  the  Arnoldi  iteration.  Both  deflation  techniques  are  accomplished  by 
working  with  matrices  in  the  projected  Krylov  space  which  for  large  eigenvalue  prob¬ 
lems  is  a  fraction  of  the  order  of  the  matrix  from  which  estimates  are  sought.  Since 
both  deflation  techniques  are  implicitly  applied  to  the  Arnoldi  factorization  the  need 
for  explicit  restarting  associated  with  all  other  deflation  strategies  is  avoided.  Both 
techniques  were  carefully  examined  with  respect  to  numerical  stability  and  computa¬ 
tional  results  were  presented.  Convergence  of  the  Arnoldi  iteration  is  improved  and  a 
reduction  in  computational  effort  is  realized.  The  numerical  examples  demonstrate  how 
the  deflation  techniques  remove  the  requirement  for  a  block  Arnoldi/Lanczos  method 
to  compute  approximations  to  multiple  or  clustered  eigenvalues. 


References 

[1]  E.  Anderson,  Z.  Bai,  C.  Bischof,  J.  Demmel,  J.  Dongarra,  J.  Du  Croz,  A.  Green- 


33 


baum,  S.  Hammarling,  A.  McKenney,  S.  Ostrouchov,  and  D.  Sorensen.  LAPACK 
Users’  Guide.  SIAM,  Philadelphia,  1992. 

[2]  W.  E.  Arnoldi.  The  principle  of  minimized  iterations  in  the  solution  of  the  matrix 
eigenvalue  problem.  Quarterly  of  Applied  Mathematics ,  9:17-29,  1951. 

[3]  Z.  Bai  and  J.  W.  Demmel.  On  swapping  diagonal  blocks  in  real  schur  form.  Linear 
Algebra  and  Its  Applications ,  186:73-95,  1993. 

[4]  Z.  Bai  and  G.  W.  Stewart.  SRRIT  —  A  FORTRAN  subroutine  to  calculate  the 
dominant  invariant  subspace  of  a  rtonsymmetric  matrix.  Technical  Report  2908, 
Department  of  Computer  Science,  University  of  Maryland,  1992. 

[5]  R.  H.  Bartels  and  G.  W.  Stewart.  Algorithm  432:  Solution  of  the  matrix  equation 
AX  +  X B  —  C .  Communications  of  the  ACM ,  15:820-826,  1972. 

[6]  M.  Bennani  and  T.  Braconnier.  Stopping  criteria  for  eigensolvers.  Technical  report, 
CERFACS,  Toulouse,  France,  November  1993. 

[7]  A.  Bojanczyk  and  P.  Van  Dooren.  Reordering  diagonal  blocks  in  the  schur  form. 
In  Linear  Algebra  for  Large  Scale  and  Real  Time  Applications,  NATO  ASI  Series, 
pages  351-352.  Kluwer  Academic  Publishers,  1993. 

[8]  T.  Braconnier.  The  Arnoldi-Tchebycheff  algorithm  for  solving  large  nonsymmetric 
eigenproblems.  Technical  Report  TR/PA/93/25,  CERFACS,  Toulouse,  France, 
1993. 

[9]  D.  Calvetti,  L.  Reichel,  and  D.  C.  Sorensen.  An  implicitly  restarted  lanczos  method 
for  large  symmetric  eigenvalue  problems.  ETNA,  2:1-21,  March  1994. 

[10]  F.  Chatelin.  Eigenvalues  of  Matrices.  Wiley,  1993. 

[11]  F.  Chatelin  and  V.  Fraysee.  Qualitative  computing:  elements  of  a  theory  for  finite- 
precision  computation.  Technical  report,  CERFACS  and  THOMSON-CSF,  June 
1993.  Lecture  Notes  for  the  Commett  European  Course,  June  8-10,  Orsay,  France. 

[12]  J.  Cullum  and  W.  E.  Donath.  A  block  lanczos  algorithm  for  computing  the  q 
algebraically  largest  eigenvalues  and  a  corresponding  eigenspace  for  large,  sparse 
symmetric  matrices.  In  Proceedings  of  the  1974  IEEE  Conference  on  Decision  and 
Control ,  pages  505-509,  New  York,  1974. 

[13]  James  Demmel.  Numerical  Linear  Algebra,  volume  1  of  Berkeley  Mathematics 
Lecture  Notes.  Center  for  Pure  and  Applied  Mathematics,  Department  of  Mathe¬ 
matics,  University  of  California,  Berkeley,  California,  1993. 

[14]  J.  Dongarra,  S.  Hammarling,  and  J.  Wilkinson.  Numerical  considerations  in  com¬ 
puting  invariant  subspaces.  SIAM  Journal  on  Matrix  Analysis  and  Applications, 
13(  1 ):  145  16 1 ,  January  1992. 

[15]  W.  Feller  and  G.E.  Forsythe.  New  matrix  transformations  for  obtaining  charac¬ 
teristic  vectors.  Quart.  Appl.  Math.,  8:325-331,  1951. 


34 


[16]  S.  Godet-Thobie.  Eigenvalues  of  large  highly  nonnormal  matrices.  PhD  thesis, 
University  Paris  IX,  Dauphine,  Paris,  France,  1993. 

[17]  G.  H.  Golub  and  C.  F.  Van  Loan.  Matrix  Computations.  Johns  Hopkins,  second 
edition,  1989. 

[18]  G.  H.  Golub,  S.  Nash,  and  C.  F.  Van  Loan.  A  Hessenberg-schur  method  for  the 
problem  AX  -f  XB  =  C .  IEEE  Transactions  on  Automatic  Control ,  AC-24:909- 
913, 1979. 

[19]  G.  H.  Golub  and  R.  Underwood.  The  block  lanczos  method  for  computing  eigen¬ 
values.  In  J.  R.  Rice,  editor,  Mathematical  Software  III,  pages  361-377,  1977. 

[20]  R.  G.  Grimes,  J.  G.  Lewis,  and  H.  D.  Simon.  A  shifted  block  Lanczos  algorithm 
for  solving  sparse  symmetric  generalized  eigenproblems.  SIAM  Journal  on  Matrix 
Analysis  and  Applications,  15(  1):228 — 272,  January  1994. 

[21]  N.  J.  Higham.  Perturbation  theory  and  backward  error  for  AX  —  X B  =  C .  BIT, 
33:124-136, 1993. 

[22]  N.  J.  Higham.  The  Test  Matrix  Toolbox  for  Matlab.  Numerical  Analysis  Report 
No.  237,  University  of  Manchester,  England,  December  1993. 

[23]  W.  Karush.  An  iterative  method  for  finding  characteristics  vectors  of  a  symmetric 
matrix.  Pacific  Journal  of  Mathematics,  1:233-248,  1951. 

[24]  C.  Lanzcos.  An  iteration  method  for  the  solution  of  the  eigenvalue  problem  of 
linear  differential  and  integral  operators.  Journal  of  Research  of  the  National 
Bureau  of  Standards,  45(4):255-282,  October  1950.  Research  Paper  2133. 

[25]  R.  B.  Lehoucq,  D.  C.  Sorensen,  and  P.  Vu.  ARPACK:  An  implementation  of  the 
Implicitly  Re-started  Arnoldi  Iteration  that  computes  some  of  the  eigenvalues  and 
eigenvectors  of  a  large  sparse  matrix.  Available  from  netlib@ornl.gov  under  the 
directory  scalapack. 

[26]  C.  C.  Paige.  The  computation  of  eigenvalues  and  eigenvectors  of  very  large  sparse 
matrices.  PhD  thesis,  University  of  London,  London,  England,  1971. 

[27]  B.  N.  Parlett.  The  Symmetric  Eigenvalue  Problem.  Prentice-Hall,  1980. 

[28]  B.  N.  Parlett  and  J.  Le.  Forward  instability  of  tridiagonal  QR.  SIAM  Journal  on 
Matrix  Analysis  and  Applications,  14(1):279-316,  1993. 

[29]  B.  N.  Parlett  and  B.  Nour-Oinid.  The  use  of  a  refined  error  bound  when  updating 
eigenvalues  of  tridiagonals.  Linear  Algebra  and  Its  Applications,  68:179-219,  1984. 

[30]  Y.  Saad.  Variations  on  Arnoldi’s  method  for  computing  eigenelements  of  large 
unsymmetric  matrices.  Linear  Algebra  and  Its  Applications,  34:269-295,  1980. 

[31]  Y.  Saad.  Numerical  Methods  for  Large  Eigenvalue  Problems.  Halsted  Press,  1992. 


35 


[32]  J.  A.  Scott.  An  Arnoldi  code  for  computing  selected  eigenvalues  of  sparse  real 
unsymmetric  matrices.  Technical  Report  RAL-93-097,  Rutherford  Appleton  Lab¬ 
oratory,  1993. 

[33]  D.  C.  Sorensen.  Implicit  application  of  polynomial  filters  in  a  k-step  Arnoldi 
method.  SIAM  Journal  on  Matrix  Analysis  and  Applications ,  13(l):357-385,  Jan¬ 
uary  1992. 

[34]  G.  W.  Stewart.  Introduction  to  Matrix  Computations.  Academic  Press,  San  Diego, 
California,  1973. 

[35]  G.  W.  Stewart.  ALGORITHM  506:  HQR3  and  EXCHANG:  Fortran  subroutines 
for  calculating  and  ordering  the  eigenvalues  of  a  real  upper  Hessenberg  matrix 
[F2],  ACM  Transactions  on  Mathematical  Software ,  2(3):275-280,  1976. 

[36]  G.  W.  Stewart.  Perturbation  bounds  for  the  QR  factorization  of  a  matrix.  SIAM 
Journal  on  Numerical  Analysis,  14:509-518,  1977. 

[37]  W.J.  Stewart  and  A.  Jennings.  A  simultaneous  iteration  algorithm  for  real  matri¬ 
ces.  ACM  Transactions  on  Mathematical  Software ,  7(2):184-198,  June  1981. 

[38]  J.  M.  Varah.  On  the  separation  of  two  matrices.  SIAM  Journal  on  Numerical 
Analysis ,  16(2):216— 222,  April  1979. 

[39]  D.  S.  Watkins.  Forward  stability  and  transmission  of  shifts  in  the  QR  algorithm. 
SIAM  Journal  on  Matrix  Analysis  and  Applications.  To  Appear. 

[40]  J.  H.  Wilkinson.  The  Algebraic  Eigenvalue  Problem.  Clarendon  Press,  Oxford, 
UK,  1965. 


36 


